Related
I'm trying to round an input double using a specified rounding mode in in-line assembly in C. To do so, I need to grab the FPU control word using fstcw and then change the bits in the word. Unfortunately I'm encountering an error on the very first line:
double roundD(double n, RoundingMode roundingMode) {
asm("fstcw %%ax \n"
::: "ax"); //clobbers
return n;
}
The assembler error I receive is:
Error: operand type mismatch for 'fstcw'.
I'm under the impression this code snippet should store the FPU control word, which is 16 bits in length, in the AX register, which is also 16 bits in length. Just to be sure, I also tested the above code with the EAX register instead of AX, and received the same error.
What might I be missing here? Please let me know if any further information is needed.
fstcw (control word) only works with a memory destination operand, not register.
Perhaps you're getting mixed up with fstsw (status word) which has a separate form (separate opcode) where the destination is AX instead of specified by an addressing mode.
That was helpful to efficiently branch based on an FP compare result (before fcomi to compare into EFLAGS existed), which happens more often than anything with the control word. That's why there's an AX-destination version of fnstsw but not fnstcw.
And BTW, you can set the rounding mode using C. #include <fenv.h>
Or far better, if SSE4.1 is available, use roundsd (or the intrinsic) to do one rounding with a custom rounding mode, without setting / restoring the SSE rounding mode (in MXCSR, totally separate from the x87 rounding mode).
I'm trying to eliminate as many gcc warnings as possible in some old code, trying to get "cleaner" code by compiling it with a more recent toolchain.
The code writes and reads registers (or memory) on ARMv6 hardware, I can't say I completely understand what it actually does but that's the gist of it for this particular line of the code in question. Side note, all the storage types are uint32.
When looking at it on the C source, it's just a bunch of macros with only 1 value being passed on, for example:
writel(readl(ADDR_GPIO_IOTR1)&(~(3<<IOTR_GPIO(26))),ADDR_GPIO_IOTR1);
That line and many others where that 26 is replaced by other values (30, 58, 59, which I presume are GPIO "pins") are generating the warnings left shift count is negative.
When looking at the preprocessed code, the bit (~(3<<IOTR_GPIO(26))) turns out to be:
(~(3<<(~(3<<((26%16)<<1)))))
That is clearly a negative left shift, no matter the value passed to the macro, that bitwise complement operator is going to turn the result of the shift 3<<anything into a negative number.
Considering that all those 3 numbers are inferred to be of type "int" (they are signed), the result of that operation should always be 0xffffffff, right?
So, IOTR_GPIO(GPIO) is defined as (~(3<<((GPIO%16)<<1))). I wrote a testcase to see what the compiler will do in each step for any value of GPIO I pass on the commandline, this is what I get for a run with 26 as the value of GPIO:
26%16=0x0000000a [0b1010]
0xa << 1=0x00000014 [0b10100]
3 << 0x14=0x00300000 [0b1100000000000000000000]
~(0x300000)=0xffcfffff [0b11111111110011111111111111111111]
So far, a negative int32, as expected.
3 << 0xffcfffff=0x80000000 [0b10000000000000000000000000000000]
Now what is going on here? I'm pretty sure that shift should have zeroed out everything.
~(0x80000000)=0x7fffffff [0b1111111111111111111111111111111]
So, no, I'm not getting 0xffffffff after all, regardless, I still get 0x7fffffff for almost all values (it changes when 0 > GPIO < 3.
However, here is what happens when I print the result of the whole
preprocessed code with a fixed value:
(~(3<<(~(3<<((26%16)<<1)))))=
0xffffffff [0b11111111111111111111111111111111]
The clear difference is that for my step-by-step test the compiler does not know the value of GPIO beforehand, as I'm passing that as an argument to my test program. When printing the result of the preprocessed code the compiler has optimized out the value at compile time and returns what I had expected.
So why isn't that negative shift returning all zeros for my testcase?, besides the fact that negative shifts are undefined behavior?
A question to myself is "how the heck is this actually working?" I truly don't expect an answer to that.
But I would like at least an opinion, considering:
I have replicated the compilation of this bit of code on a testcase 1:1 (same toolchain, same gcc arguments) of the running code.
I even ran the testcase on the ARMv6 hardware in question and I got the exact same results as on a modern gcc-5.3.0 on x86_64 (with or without -m32, as I'm storing everything in uint32_t).
There are no other versions of these lines anywhere to be found in history, as far as I can deduce, they were added to "support a new chip" (guessing from the couple #ifdef around this).
What would the intention of the programmer in this case could have been? Even the original toolchain spits the exact same warning, I don't think it was ignored.
What I may be really asking is "how was this intentional"?.
Might it be that at some other point (linking perhaps?) something changes and a different result is being used? Kind of hard to duplicate/testcase/inspect that I think. But I'm going to put a printf there somewhere and run it just to make sure I'm not going crazy.
The testcase I made: negative_shift_test.c
The original, unmodified messed up code: starts here
The complete, indented preprocessed line (#L3093 in the linked code above):
({
do {
__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory");
outer_sync();
} while (0);
(
(void)(
(void)0,
*(volatile unsigned int *)(((((0x088CE000) - 0x00000000 + 0xf0000000) + 0x004))) =
(( u32) (( __le32)(__u32)(
({
u32 __v = ({
u32 __v = (( __u32)(__le32)(( __le32) ((void)0, *(volatile unsigned int *)(((((0x088CE000) - 0x00000000 + 0xf0000000) + 0x004))))));
__v;
});
__asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" : : "r" (0) : "memory");
__v;
}) & (~(3<<(~(3<<((26%16)<<1))))) /* sits here unmolested */
)))
)
);
});
(read address, bitwise & (AND) the result of the read and write that back to the same address, if I understood it correctly).
One side of the problem is just this:
I wrote a testcase
As you said yourself, you wrote a testcase for an unreadable piece of code that happens to work despite hitting undefined behavior. That's really not a big surprise that your testcase does something unexpected, and different. Touching anything in that code can dispel the magic, you can't just deconstruct it and run it bit by bit. Changing the toolchain can also break it, BTW.
Without digging further into what gcc might do with the code, assuming that the facts are all true, this question is unanswerable because it contains a contradiction:
So why isn't that negative shift returning all zeros for my testcase?,
besides the fact that negative shifts are undefined behavior?
You seem to expect undefined behavior to have some defined behavior...
OTOH, the question below is easy to answer:
A question to myself is "how the heck is this actually working?" I
truly don't expect an answer to that.
The answer is "why not ?". UB can be the behavior that the author expects, as it's "defined" (hum) as any behavior.
So the actual problem is this:
The code writes and reads registers (or memory) on ARMv6 hardware, I
can't say I completely understand what it actually does
You can't refactor it without understanding it. That involves finding the author, and torturing him (or her) if necessary. No need to torture other innocent people.
PS oh and that question is easy too:
What would the intention of the programmer in this case could have
been? Even the original toolchain spits the exact same warning, I
don't think it was ignored.
That's called an evil programmer. One more reason to find him.
PPS I'm betting on a bug, the author forgot that IOTR_GPIO already does the ~(3<< shift and does it twice. The to-infinity-and-beyond 2nd shift doesn't make sense.
First and foremost, the source in question belongs to the opensourced kernel for a specific Samsung Galaxy model (the GT-S5367).
That being said, that model belongs to the bcm21553 family of boards, for which there were many source code zip packages released by Samsung.
In that family the S5360 is a whole "sub" family with many variants, the Totoro board. The S5367 is also a Totoro board.
When looking for different versions of the same file to spot differences in these lines that performed the negative left shifts, I restricted my search to the S5360 alone, suffice to say I found no differences, every single source had the same bug.
After a while testing with many printk() in the kernel and looking at the generated output, I decided to search on github for the dubious macro itself IOTR_GPIO.
In doing so I found many duplicates of the macro definition on source derived from my own sub family (plenty of board-totoro.c).
But then, to my surprise, a different board, the Torino (still based on the bcm21553 had the same macro definition but without the extra negative shift.
So (I'm assuming) this ended up being just a copypasta bug. I believe the intention was to move the mask to (or from) the macro definition, but the programmer forgot to remove the code off the other side.
The code worked fine because all it does is read a value that is flattened against a mask (created by this macro) and then write it back on the same spot.
Since the actual, working, mask just positions two zero bits across depending on the GPIO pin, when the value being read (and bitwise &'ded) has those two bits also cleared then the full, bogus, 0xffffffff mask results in no difference against the proper mask, and thus the code works fine, even with such a nasty bug in place.
TL;DR, as #ilya pointed out in a comment, the correct macro definition is:
#define IOTR_GPIO(GPIO) ((GPIO%16)<<1)
No negative shift there to worry about, the bitwise complement is done afterwards to create the actual mask and not to shift bits again.
With that change the code compiles without warnings and works just fine as before.
PS Thanks #ilya for helping the brainstorming out :+).
I would like to ask you how to determine in which ISA (ARM/Thumb/Thumb-2) an instruction is encoded?
First of all, I tried to do it following the instructions here (section 4.5.5).
However, when I use readelf -s ./arm_binary, and arm_binary was built in release mode, it appears that there is no .symtab in the binary. And anyway, I don't understand how to use this command to find the type for the instructions.
Secondly, I know the other way to differentiate is to look at the PC address for the ARM/Thumb instruction. If it is even then it is a Thumb instruction, if not - then ARM. But how can I do this without loading the file to memory? When I parse the sections of the file and find the execute section, all that I have is the start (offset) location in the file and the file-offset is always even, and it will be always even because we have instruction of size equal to 2 or 4...
Finally, the last way to check is to detect BX Rm, extract the value from Rm, and then check if that address in Rm is it even or not. But, this may be difficult because for this I would need to emulate the whole program.
So what is the correct way to identify the ISA for disassembly?
Thank you for your attention and I hope you will help me.
I don't believe it's possible to tell, in a mixed mode binary, without inspecting the instructions as you describe.
If the whole file will be one ISA or the other, then you can determine the ISA of the entry point by running this:
readelf -h ./arm_binary
And checking whether the entry point is even or odd.
However, what I would do is simply disassemble it both ways, and see what looks right. As long as you start the disassembly at the start of a function (or any 4-byte boundary), then this will work fine. Most code will produce nonsense when disassembled in the wrong ISA.
If all values are nothing more than one or more bytes, and no byte can contain metadata, how does the system keep track of what sort of number a byte represents? Looking into Two's Complement and Single Point on Wikipedia reveals how these numbers can be represented in base-two, but I'm still left wondering how the compiler or processor (not sure which I'm really dealing with here) determines that this byte must be a signed integer.
It is analogous to receiving an encrypted letter and, looking at my shelf of cyphers, wondering which one to grab. Some indicator is necessary.
If I think about what I might do to solve this problem, two solutions come to mind. Either I would claim an additional byte and use it to store a description, or I would allocate sections of memory specifically for numerical representations; a section for signed numbers, a section for floats, etc.
I'm dealing primarily with C on a Unix system but this may be a more general question.
how does the system keep track of what sort of number a byte represents?
"The system" doesn't. During translation, the compiler knows the types of the objects it's dealing with, and generates the appropriate machine instructions for dealing with those values.
Ooh, good question. Let's start with the CPU - assuming an Intel x86 chip.
It turns out the CPU does not know whether a byte is "signed" or "unsigned." So when you add two numbers - or do any operation - a "status register" flag is set.
Take a look at the "sign flag." When you add two numbers, the CPU does just that - adds the numbers and stores the result in a register. But the CPU says "if instead we interpreted these numbers as twos complement signed integers, is the result negative?" If so, then that "sign flag" is set to 1.
So if your program cares about signed vs unsigned, writing in assembly, you would check the status of that flag and the rest of your program would perform a different task based on that flag.
So when you use signed int versus unsigned int in C, you are basically telling the compiler how (or whether) to use that sign flag.
The code that is executed has no information about the types. The only tool that knows the types is the compiler
at the time it compiles the code. Types in C are solely a restriction at compile time to prevent you
from using the wrong type somewhere. While compiling, the C compiler keeps track of the type
of each variable and therefore knows which type belongs to which variable.
This is the reason why you need to use format strings in printf, for example. printf has no chance of knowing what type it will get in the parameter list as this information is lost. In languages like go or java you have a runtime with reflection capabilities which makes it possible to get the type.
Suppose your compiled C code would still have type information in it, there would be the need for
the resulting assembler language to check for types. It turns out that the only thing close to types in assembly is size
of the operands for an instruction determined by suffixes (in GAS). So what is left from your type information is the size and nothing more.
One example for assembly which supports type is the java VM bytecode, which has type suffixes
for operands for primitives.
It is important to remember that C and C++ are high level languages. The compiler's job is to take the plain text representation of the code and build it into the platform specific instructions the target platform is expecting to execute. For most people using PCs this tends to be x86 assembly.
This is why C and C++ are so loose with how they define the basic data types. For example most people say there are 8 bits in a byte. This is not defined by the standard and there is nothing against some machine out there having 7 bits per byte as its native interpretation of data. The standard only recognizes that a byte is the smallest addressable unit of data.
So the interpretation of data is up to the instruction set of the processor. In many modern languages there is another abstraction on top of this, the Virtual Machine.
If you write your own scripting language it is up to you to define how you interpret your data in software.
Using C besides the compiler, that perfectly well knows about the type of the given values there is no system that knows about the type of a given value.
Note that C by itself doesn't bring any runtime type information system with it.
Take a look at the following example:
int i_var;
double d_var;
int main () {
i_var = -23;
d_var = 0.1;
return 0;
}
In the code there are two different types of values involved one to be stored as an integer and one to be stored as a double value.
The compiler that analyzes the code pretty well knows about the exact types of both of them. Here the dump of a short fragment of the type information gcc held while generation code generated by passing the -fdump-tree-all to gcc:
#1 type_decl name: #2 type: #3 srcp: <built-in>:0
chan: #4
#2 identifier_node strg: int lngt: 3
#3 integer_type name: #1 size: #5 algn: 32
prec: 32 sign: signed min : #6
max : #7
...
#5 integer_cst type: #11 low : 32
#6 integer_cst type: #3 high: -1 low : -2147483648
#7 integer_cst type: #3 low : 2147483647
...
#3805 var_decl name: #3810 type: #3 srcp: main.c:3
chan: #3811 size: #5 algn: 32
used: 1
...
#3810 identifier_node strg: i_var lngt: 5
Hunting down the #links you should clearly see that there really is a lot of information stored about memory-size, alignment-constraints and allowed min- and max-values for the type "int" stored in the nodes #1-3 and #5-7. (I left out the #4 node as the mentioned "chan" entry is just used to cha i n up any type definitions in the generated tree)
Reagarding the variable declared at main.c line 3 it is known, that it is holding a value of type int as seen by the type reference to node #3.
You'll sure be able to hunt down the double entries and the ones for d_var in an own experiment yourself if you don't trust me they will also there.
Taking a look at the generated assembler code (using gcc pass the -S switch) listed we can take a look at the way the compiler used this information in code generation:
.file "main.c"
.comm i_var,4,4
.comm d_var,8,8
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
movl $-23, i_var
fldl .LC0
fstpl d_var
movl $0, %eax
popl %ebp
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long -1717986918
.long 1069128089
.ident "GCC: (Debian 4.4.5-8) 4.4.5"
.section .note.GNU-stack,"",#progbits
Taking a look at the assignment instructions you will see that the compiler figured out the right instructions "mov" to assign our int value and "fstp" to assign our "double" value.
Nevertheless besides the instructions chosen at the machine level there is no indication of the type of those values. Taking a look at the value stored at .LC0 the type "double" of the value 0.1 was even broken down in two consecutive storage locations each for a long to meet the known "types" of the assembler.
As a matter of fact breaking the value up this way was just one choice of other possiblities, using 8 consecutive values of "type" .byte would have done equally well.
I am working with a large C library where some array indices are computed using int.
I need to find a way to trap integer overflows at runtime in such way as to narrow to problematic line of code. Libc manual states:
FPE_INTOVF_TRAP
Integer overflow (impossible in a C program unless you enable overflow trapping in a hardware-specific fashion).
however gcc option -ffpe-trap suggests that those only apply to FP numbers?
So how I do enable integer overflow trap? My system is Xeon/Core2, gcc-4.x, Linux 2.6
I have looked through similar questions but they all boil to modifying the code. I need to know however which code is problematic in the first place.
If Xeons can't trap overflows, which processors can? I have access to non-emt64 machines as well.
I have found a tool designed for llvm meanwhile: http://embed.cs.utah.edu/ioc/
There doesn't seem to be however an equivalent for gcc/icc?
Ok, I may have to answer my own question.
I found gcc has -ftrapv option, a quick test does confirm that at least on my system overflow is trapped. I will post more detailed info as I learn more since it seems very useful tool.
Unsigned integer arithmetic does not overflow, of course.
With signed integer arithmetic, overflow leads to undefined behaviour; anything could happen. And optimizers are getting aggressive about optimizing stuff that overflows. So, your best bet is to avoid the overflow, rather than trapping it when it happens. Consider using the CERT 'Secure Integer Library' (the URL referenced there seems to have gone AWOL/404; I'm not sure what's happened yet) or Google's 'Safe Integer Operation' library.
If you must trap overflow, you are going to need to specify which platform you are interested in (O/S including version, compiler including version), because the answer will be very platform specific.
Do you know exactly which line the overflow is occuring on? If so, you might be able to look at the assembler's Carry flag if the operation in question caused an overflow. This is the flag that the CPU uses to do large number calculation and, while not available at the C level, might help you to debug the problem - or at least give you a chance to do something.
BTW, found this link for gcc (-ftrapv) that talks about an integer trap. Might be what you are looking for.
You can use inline assembler in gcc to use an instruction that might generate an overflow and then test the overflow flag to see if it actually does:
int addo(int a, int b)
{
asm goto("add %0,%1; jo %l[overflow]" : : "r"(a), "r"(b) : "cc" : overflow);
return a+b;
overflow:
return 0;
}
In this case, it tries to add a and b, and if it does, it goes to the overflow label. If there's no overflow, it continues, doing the add again and returning it.
This runs into the GCC limitation that an inline asm block cannot both output a value and maybe branch -- if it weren't for that, you wouldn't need a second add to actually get the result.