This might be overly specific, but posting here as it might help someone else who's trying to compile/run the SPEC 2006 benchmarks outside the default SPEC benchmark harness. (Our reason of doing this is comparing compiling strategies and code coverage, while the SPEC harness is focused on performance of the resulting code only).
When performing a ref run of perlbench the benchmark crashes with a segmentation fault:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f6868 in S_regmatch (prog=0x832144)
at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:3024
3024 PL_reg_start_tmp[n] = locinput;
(gdb) bt
#0 0x00000000004f6868 in S_regmatch (prog=0x832144)
at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:3024
#1 0x00000000004f22cf in S_regtry (prog=0x8320c0, startpos=0x831e70 "o")
at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:2196
#2 0x00000000004eba71 in Perl_regexec_flags (prog=0x8320c0, stringarg=0x831e70 "o", strend=0x831e71 "",
strbeg=0x831e70 "o", minend=0, sv=0x7e2528, data=0x0, flags=3)
at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:1910
#3 0x00000000004b33bb in Perl_pp_match ()
at <path-to-spec>/CPU2006/400.perlbench/src/pp_hot.c:1340
#4 0x00000000004fcde4 in Perl_runops_standard ()
at <path-to-spec>/CPU2006/400.perlbench/src/run.c:37
#5 0x000000000046bf57 in S_run_body (oldscope=1)
at <path-to-spec>/CPU2006/400.perlbench/src/perl.c:2017
#6 0x000000000046b9f6 in perl_run (my_perl=0x7bf010)
at <path-to-spec>/CPU2006/400.perlbench/src/perl.c:1934
#7 0x000000000047add2 in main (argc=4, argv=0x7fffffffe178, env=0x7fffffffe1a0)
at <path-to-spec>/CPU2006/400.perlbench/src/perlmain.c:98
The execution environment is 64-bit Linux and the behaviour is observed with both the latest gcc and clang.
What causes this crash?
The segfault is caused by a garbage value of the variable n on the pointed out line. Inspecting the code shows that the value comes from the field arg1 of an object of type:
struct regnode_1 {
U8 flags;
U8 type;
U16 next_off;
U32 arg1;
};
Inspecting the memory location of the object shows that it is not packed, i.e. there is 32bit padding between next_off and arg1:
(gdb) x/16xb scan
0x7f4978: 0xde 0x2d 0x02 0x00 0x00 0x00 0x00 0x00
0x7f4980: 0x00 0x11 0x0d 0x00 0x00 0x00 0x00 0x00
(gdb) print/x n
$1 = 0xd1100
This is suspicious. There's pointer and type conversion going on in perlbench, so perhaps type size assumptions fail somewhere. Compiling with multilib yields a working benchmark and examining the memory verifies that there is no padding.
Forcing the structure into a bitfield fixes the crash when performing a 64-bit compile:
struct regnode_1 {
U8 flags : 8;
U8 type : 8;
U16 next_off : 16;
U32 arg1 : 32;
};
This is how our little investigation progressed:
At first we thought it was some padding issue, but as Peter pointed out on Godbolt, no such thing occurs. So, the packing or not of the structure did not change anything.
Then, I got suspicious of the (clearly twisted) way that Perl handles pointers. The majority of the casts are violating strict aliasing as defined by the standard. Since the segmentation fault happened on a pointer cast, namely:
struct regnode {
U8 flags;
U8 type;
U16 next_off;
};
to
struct regnode_1 {
U8 flags;
U8 type;
U16 next_off;
U32 arg1;
};
However, enabling it with the -fstrict-aliasing flags didn't change anything. Although it qualifies as undefined behaviour, there is no overlap in memory, since the elements/nodes of the regular expression that is being currently parsed are laid out separately in memory.
Going deeper and checking the LLVM IR for the switch block in question, I got this in regexec.ll
; truncated
%876 = load %struct.regnode*, %struct.regnode** %scan, align 8, !dbg !8005
%877 = bitcast %struct.regnode* %876 to %struct.regnode_1*, !dbg !8005
%arg11715 = getelementptr inbounds %struct.regnode_1, %struct.regnode_1* %877, i32 0, i32 3, !dbg !8005
%878 = load i64, i64* %arg11715, align 8, !dbg !8005
store i64 %878, i64* %n, align 8, !dbg !8006
; truncated
The load/store instructions are using a 64-bit integer, which means that the pointer in C is interpreted as pointing to an 8 bytes integer (instead of 4). Thus, gathering 2 bytes outside the current regex node struct bounds for calculating the value of arg1. This value is in turn used as an array index which ultimately causes a segfault crash when it is out of array bounds.
Back to tracing where U32 is interpreted as a 64-bit unsigned integer. Looking into file spec_config.h, the conditional compilation leads (at least in my machine) to a preprocessor block that starts with
#elif !defined(SPEC_CPU_GOOFY_DATAMODEL)
which, according to a code comment in the surrounding area, is supposed to correspond to a ILP32 data model (see also this). However, U32TYPE is defined as an unsigned long, which on my machine is 64 bits.
So, the fix is to change the definition to
#define U32TYPE uint32_t
which, as stated in this, is guaranteed to be exactly 32 bits (if supported).
I'd like to complement the other answers by saying that it was enough for us to add -DSPEC_CPU_LP64 to work around the segfault (-DSPEC_LP64 in CPU2017). Would be nice if the SPEC group would add this to their FAQ. This also seems to apply to gcc, cactusADM, povray and wrf.
We have a python script generating the config files for us, I'll talk to people and see if I can share what we have so far to get it running for our compiler.
Edit: Seems to be accesible from the outside anyway, so here you go: spec.py
Related
I'm using JNI definitions from here. I create a JNINativeInterface_ with most members initialized to None. I then run native code which uses the RegisterNatives field of the aforementioned struct. I initialized RegisterNatives and surrounding fields as such:
SetDoubleArrayRegion: unsafe { transmute(0xdeadbeaf as u64) },
RegisterNatives: Some(register_natives),
UnregisterNatives: unsafe { transmute(0xdeadbeaf as u64) },
register_natives is defined like so(this matches the library type exactly):
unsafe extern "system" fn register_natives(env: *mut sys::JNIEnv,
clazz: jclass,
methods: *const JNINativeMethod,
nMethods: jint) -> jint {
unimplemented!()
}
The native code that uses the struct segfaults(and seems to get a null ptr instead of register_natives).
The relevant part of the struct looks like so under GDB:
0x7ffcf5f4a5b8: 0x0 0x0 0x0 0x0
0x7ffcf5f4a5c8: 0xdeadbeaf 0x0 0x43fd9950 0x55ea
0x7ffcf5f4a5d8: 0xdeadbeaf 0x0 0x0 0x0
0x7ffcf5f4a5e8: 0x0 0x0 0x0 0x0
I'm confused as to exactly what I am looking at since I was expecting 0xdeadbeaf , followed by a 64 bit pointer, followed by 0xdeadbeaf, but as you can see that is not what I get. Am I wrong about my assumptions as to how option will be represented behind the scenes? Why does bindgen/the aformentioned library seem to thing that Option will lead to a compatible interface?
[...] I was expecting 0xdeadbeaf, followed by a 64 bit pointer, followed by 0xdeadbeaf, but as you can see that is not what I get.
We must not be seeing the same thing, because I do see that.
0x7ffcf5f4a5c8: 0xdeadbeaf 0x0 0x43fd9950 0x55ea
0x7ffcf5f4a5d8: 0xdeadbeaf 0x0 0x0 0x0
Each hex number is a 32-bit integer, so you have to take two of them to make a 64-bit integer. The first is 0x00000000deadbeaf, the second is 0x000055ea43fd9950 (your register_natives function, presumably) and the third is 0x00000000deadbeaf again. (It's also "obvious" from the addresses: a 64-bit integer takes 8 bytes, so it takes two to take 0x10 bytes. Therefore, there are two 64-bit integer per line.)
The reason the program segfaults may be because letting a panic unwind through foreign code is undefined behavior. Try changing your register_natives function to something that doesn't panic.
For testing the MPU and playing around with exploits, I want to execute code from a local buffer running on my STM32F4 dev board.
int main(void)
{
uint16_t func[] = { 0x0301f103, 0x0301f103, 0x0301f103 };
MPU->CTRL = 0;
unsigned int address = (void*)&func+1;
asm volatile(
"mov r4,%0\n"
"ldr pc, [r4]\n"
:
: "r"(address)
);
while(1);
}
In main, I first turn of the MPU. In func my instructions are stored. In the ASM part I load the address (0x2001ffe8 +1 for thumb) into the program counter register. When stepping through the code with GDB, in R4 the correct value is stored and then transfered to PC register. But then I will end up in the HardFault Handler.
Edit:
The stack looks like this:
0x2001ffe8: 0x0301f103 0x0301f103 0x0301f103 0x2001ffe9
The instructions are correct in the memory. Definitive Guide to Cortex says region 0x20000000–0x3FFFFFFF is the SRAM and "this region is executable,
so you can copy program code here and execute it".
You are assigning 32 bit values to a 16 bit array.
Your instructions dont terminate, they continue on to run into whatever is found in ram, so that will crash.
You are not loading the address to the array into the program counter you are loading the first item in the array into the program counter, this will crash, you created a level of indirection.
Look at the BX instruction for this rather than ldr pc
You did not declare the array as static, so the array can be optimized out as dead and unused, so this can cause it to crash.
The compiler should also complain that you are assigning a void* to an unsigned variable, so a typecast is wanted there.
As a habit I recommend address|=1 rather than +=1, in this case either will function.
I'm working on a toy kernel for fun and education (not a class project). I'm starting work on my memory manager, so I'm trying to get the memory map from BIOS using an INT 0x15, EAX=E820 call while still in Real Mode. I'm adapting my function from the osdev wiki (here, in the section "Getting an E820 Memory Map"). However, I want this to be a function I can call from my C code, so I'm trying to change it a bit. I want it to take two arguments: a pointer to where to store the map entries, and a pointer to an integer which will be incremented by the number of entries in the table.
According to the wiki, ES:DI needs to be pointing at where the data should be stored, so I split my first argument into two (the segment selector, pointer_to_map / 16, and the offset, pointer_to_map % 16). Here's part of C code:
typedef struct SMAP_entry {
unsigned int baseL; // Base address, a QWORD
unsigned int baseH;
unsigned int lengthL; // Length, a QWORD
unsigned int lengthH;
unsigned int type; // entry type
unsigned int ACPI; // extra data from ACPI 3.0
} SMAP_entry_t;
SMAP_entry_t data[100];
kprint("Pointer: ");
kprint_int((int) data, 16);
kprint_newline();
int res = 0;
read_mem_map(((int) data) / 16, ((int) data) % 16, &res);
kprint("res: ");
kprint_int(res, 16);
kprint_newline();
Here's part of my ASM code:
; performs a INT 0x15, eax=0xE820 call to find the memory map
; inputs: the pointer to the data table / 16, the pointer % 16, a pointer to an dword (int) which will be
; incremented by the number of entries after this function returns.
; preserves: no registers except esi
read_mem_map:
mov es, [esp + 4] ; set es to the value of the first argument
mov di, [esp + 8] ; set di to the value of the second argument
That's all I'm pasting in because the program triple-faults and shuts down the VM there. By moving ret commands around, I found that the function crashes on the very first line. If I comment out the call in C, then everything works as you'd expect.
I've read through Google that there's almost never a reason to set ES:DI directly, and in the code that I've found which does, they set it to a literal. How should I set ES:DI and if I shouldn't set it directly, how should I make the C and ASM interact in the correct way?
Each of the segment registers (on 80x86) have a visible part, and several hidden fields (the segment base, the segment limit and the segment's attributes - read/write, privilege level, etc).
In protected mode; when you load a segment register the CPU uses the visible part as an index into either the GDT or LDT, and loads the segment's hidden fields from that descriptor (in the GDT or LDT).
In real mode; the CPU does something completely different - it only sets the segment base to "visible part * 16" and doesn't use any (GDT, LDT) table.
Given the fact that you're using a 32-bit pointer to the data table and a 32-bit stack pointer (e.g. mov es, [esp + 4]); I assume your C code is in 32-bit protected mode. This is completely incompatible with real mode, partly because segment loads work completely differently and partly because the default operand/address size is 32-bit and not 16-bit.
All BIOS functions are designed for real mode. They can't be used in protected mode.
Basically; I'd recommend:
pass the pointer to the data table to your assembly as a 32-bit integer/pointer (and not 2 separate 16-bit integers)
call a "go to real mode" function (which will be slightly tricky, as you'd also be switching from a 32-bit stack to a 16-bit stack and will need a "32-bit return instruction" in 16-bit code).
split the pointer to the data table into its segment and offset in assembly, and load the segment (which should work correctly as you're in real mode now)
call the BIOS function (which should work correctly as you're in real mode now)
call a "go to protected mode" function (which will be slightly tricky again, including a "16-bit return instruction" in 32-bit code).
return to the (32-bit protected mode) caller
Instructions for switching from real mode to protected mode, and switching from protected mode to real mode, are included in Intel's system programmer's guide. :)
I'm reading in the first Bytes of an File with fread:
fread(&example_struct, sizeof(example_struct), 1, fp_input);
Which ends up with different results under linux and solaris? Whereby the example_struct (Elf32_Ehdr) is part of Standart GNU C Liborary defined in elf.h? I would be happy to know why this happens?
General the struct looks the following:
typedef struct
{
unsigned char e_ident[LENGTH];
TYPE_Half e_type;
} example_struct;
The Debugcode:
for(i=0;paul<sizeof(example_struct);i++){
printf("example_struct->e_ident[%i]:(%x) \n",i,example_struct.e_ident[i]);
}
printf("example_struct->e_type: (%x) \n",example_struct.e_type);
printf("example_struct->e_machine: (%x) \n",example_struct.e_machine);
Solaris output:
Elf32_Ehead->e_ident[0]: (7f)
Elf32_Ehead->e_ident[1]: (45)
...
Elf32_Ehead->e_ident[16]: (2)
Elf32_Ehead->e_ident[17]: (0)
...
Elf32_Ehead->e_type: (200)
Elf32_Ehead->e_machine: (6900)
Linux output:
Elf32_Ehead->e_ident[0]: (7f)
Elf32_Ehead->e_ident[1]: (45)
...
Elf32_Ehead->e_ident[16]: (2)
Elf32_Ehead->e_ident[17]: (0)
...
Elf32_Ehead->e_type: (2)
Elf32_Ehead->e_machine: (69)
Maybe similar to: http://forums.devarticles.com/c-c-help-52/file-io-linux-and-solaris-108308.html
You don't mention what CPU you have in the machines, maybe Sparc64 in the Solaris machine and x86_64 in the Linux box, but I would guess that you're having an endianness issue. Intel, ARM and most other common architectures today are what is known as little-endian, the Sparc architecture is big-endian.
Let's assume we have the value 0x1234 in a CPU register and we want to store it in memory (or on hard drive, it doesn't matter where). Let N be the memory address we want to write to. We will need to store this 16 bit integer as two bytes in memory, here comes the confusing part:
Using a big-endian machine will store 0x12 at address N and 0x34 at address N+1.
A little-endian machine will store 0x34 at address N and 0x12 at address N+1.
If we store a value using a little endian machine and read it back using a big endian machine we will have swapped the two bytes around and you'll get the issue that you are seeing.
Probably because of differences in the structure packing between the two platforms. It's a bad idea to read structures directly (as units) from external media, since issues like these tend to pop up.
I'm writing a program where a constant is needed but the value for the constant will be determined during run time. I have an array of op codes from which I want to randomly select one and _emit it into the program's code. Here is an example:
unsigned char opcodes[] = {
0x60, // pushad
0x61, // popad
0x90 // nop
}
int random_byte = rand() % sizeof(opcodes);
__asm _emit opcodes[random_byte]; // optimal goal, but invalid
However, it seems _emit can only take a constant value. E.g, this is valid:
switch(random_byte) {
case 2:
__asm _emit 0x90
break;
}
But this becomes unwieldy if the opcodes array grows to any considerable length, and also essentially eliminates the worth of the array since it would have to be expressed in a less attractive manner.
Is there any way to neatly code this to facilitate the growth of the opcodes array? I've tried other approaches like:
#define OP_0 0x60
#define OP_1 0x61
#define OP_2 0x90
#define DO_EMIT(n) __asm _emit OP_##n
// ...
unsigned char abyte = opcodes[random_byte];
DO_EMIT(abyte)
In this case, the translation comes out as OP_abyte, so it would need a call like DO_EMIT(2), which forces me back to the switch statement and enumerating every element in the array.
It is also quite possible that I have an entirely invalid approach here. Helpful feedback is appreciated.
I'm not sure what compiler/assembler you are using, but you could do what you're after in GCC using a label. At the asm site, you'd write it as:
asm (
"target_opcode: \n"
".byte 0x90\n" ); /* Placeholder byte */
...and at the place where you want to modify that code, you'd use:
extern volatile unsigned char target_opcode[];
int random_byte = rand() % sizeof(opcodes);
target_opcode[0] = random_byte;
Perhaps you can translate this into your compiler's dialect of asm.
Note that all the usual caveats about self-modifying code apply: the code segment might not be writeable, and you may have to flush the I-cache before executing the modified code.
You won't be able to do any randomness in the C preprocessor AFAIK. The closest you could get is generating the random value outside. For instance:
cpp -DRND_VAL=$RANDOM ...
(possibly with a modulus to maintain the value within a range), at least in UNIX-based systems. Then, you can use the definition value, that will be essentially random.
How about
char operation[4]; // is it really only 1 byte all the time?
operation[0] = random_whatever();
operation[1] = 0xC3; // RET
void (*func)() = &operation[0];
func();
Note that in this example you'd need to add a RET instruction to the buffer, so that in the end you end up at the right instruction after calling func().
Using an _emit at runtime into your program code is kind of like compiling the program you're running while the program is running.
You should describe your end-goal rather than just your idea of using _emit at runtime- there might be abetter way to accomplish what you want. Maybe you can write your opcodes to a regular data array and somehow make that bit of memory executable. That might be a little tricky due to security considerations, but it can be done.