What is a privileged instruction? - c

I have added some code which compiles cleanly and have just received this Windows error:
---------------------------
(MonTel Administrator) 2.12.7: MtAdmin.exe - Application Error
---------------------------
The exception Privileged instruction.
(0xc0000096) occurred in the application at location 0x00486752.
I am about to go on a bug hunt, and I am expecting it to be something silly that I have done which just happens to produce this message. The code compiles cleanly with no errors or warnings. The size of the EXE file has grown to 1,454,132 bytes and includes links to ODCS.lib, but it is otherwise pure C to the Win32 API, with DEBUG on (running on a P4 on Windows 2000).

To answer the question, a privileged instruction is a processor op-code (assembler instruction) which can only be executed in "supervisor" (or Ring-0) mode.
These types of instructions tend to be used to access I/O devices and protected data structures from the windows kernel.
Regular programs execute in "user mode" (Ring-3) which disallows direct access to I/O devices, etc...
As others mentioned, the cause is probably a corrupted stack or a messed up function pointer call.

This sort of thing usually happens when using function pointers that point to invalid data.
It can also happen if you have code that trashes your return stack. It can sometimes be quite tricky to track these sort of bugs down because they usually are hard to reproduce.

A privileged instruction is an IA-32 instruction that is only allowed to be executed in Ring-0 (i.e. kernel mode). If you're hitting this in userspace, you've either got a really old EXE, or a corrupted binary.

As I suspected, it was something silly that I did. I think I solved this twice as fast because of some of the clues in comments in the messages above. Thanks to those, especially those who pointed to something early in the app overwriting the stack. I actually found several answers here more useful than the post I have marked as answering the question as they clued and queued me as to where to look, though I think it best sums up the answer.
As it turned out, I had just added a button that went over the maximum size of an array holding some toolbar button information (which was on the stack). I had forgotten that
#define MAX_NUM_TOOBAR_BUTTONS (24)
even existed!

First probability that I can think of is, you may be using a local array and it is near the top of the function declaration. Your bounds checking gone insane and overwrite the return address and it points to some instruction that only kernel is allowed to execute.

The error location 0x00486752 seems really small to me, before where executable code usually lives. I agree with Daniel, it looks like a wild pointer to me.

I saw this with Visual c++ 6.0 in the year 2000.
The debug C++ library had calls to physical I/O instructions in it, in an exception handler.
If I remember correctly, it was dumping status to an I/O port that used to be for DMA base registers, which I assume someone at Microsoft was using for a debugger card.
Look for some error condition that might be latent causing diagnostics code to run.
I was debugging, backtracked and read the dissassembly. It was an exception while processing std::string, maybe indexing off the end.

The CPU of most processors manufactured in the last 15 years have some special instructions which are very powerful. These privileged instructions are kept for operating system kernel applications and are not able to be used by user written programs.
This restricts the damage that a user-written program can inflict upon the system and cuts down the number of times that the system actually crashes.

When executing in kernel mode, the operating system has unrestricted access to both the kernel and the user program's memory.
The load instructions for the base and limit registers are privileged instructions.

Related

how do you intercept the address of an instruction that is writing to a segment of memory?

Imagine we have a usual instruction such as this one
mov [eax], ebx
and eax contains some address that we would like to write to.
The idea is to write a c program that tells you which address contains the instruction, if we already know the address that it's going to be writing to.
The real question:
write a c program using the free sony pspsdk that would accomplish the same thing.
The psp uses MIPS III / IV and the instruction would look something like
sw a0 $00(t0)
##which literally spells out store register a0 at offset t0 + 0 bytes. where t0 would
## contain something like 0x08800000
disclaimer: it is still useful to know how to do this on windows, so if somebody only knows how to do this on windows or even osx, That would still be appreciated as it could provide relevant information on similar programming practices to accomplish this particular task.
Intercepting an instruction that writes to a particular address is not a normal activity in programs.
It is a feature provided by some debuggers. There are at least three ways debuggers may be able to do this:
A debugger can examine the program code and find where a particular instruction writes to a particular address. This is actually a hugely complicated activity that requires interpreting the instructions. Often, a debugger cannot do it completely; as doing so in general is equivalent to completely interpreting and executing the program the same way the computer processor does, and it is very slow to do in software. Instead, the debugger may plan part of program execution and put in a breakpoint at a spot where it is unable to easily continue, such as at a branch instruction that depends on a value the debugger is not prepared to compute. A breakpoint is a special instruction that interrupts program execution and, in this case, results in the operating system transferring control to the debugger. At that time, the debugger removes the breakpoint, requests that the instruction be single-stepped (that the processor execute the single instruction and then interrupt program execution immediately), examines the result, and continues.
A debugger can mark the page of memory containing the desired address as no-access. Then, whenever the program accesses that memory, the hardware will interrupt program execution, and the operating system will transfer control to the debugger. The debugger examines the instruction that caused the interruption. If the instruction is accessing the target address, the debugger acts on that. If it is not, the debugger changes the memory protection to allow access, requests that the instruction be single-stepped, changes the memory protection to disallow access, and resumes the program to wait for the next interruption. (Instead of single-stepping the instruction, the debugger might just emulate it, since that might avoid changing the memory protection twice, which can be expensive.)
Some computer processor models have features to support this sort of debugging feature. The debugger can request that a portion of memory be monitored, so that the hardware interrupts program execution when a particular address is accessed, instead of when any part of a whole memory page is accessed.
I cannot speak to the Sony platform you are using. You would have to check its documentation or ask others regarding the availability of such features. Since this is a feature most often used by debuggers, investigating the documentation regarding debugging could be a way to find out whether the system supports such a feature.

Get user stackpointer from task_struct

I have kcore and I want to get userspace backtrace from kcore. Because some one from our application is making lot of munmap and making the system hang(CPU soft lockup 22s!). I looked at some macro but still this is just giving me kernel backtrace only. What I want is userspace backtrace.
Good news is I have pointer to task_struct.
task_struct->thread->sp (Kernel stack pointer)
task_struct->thread->usersp (user stack pointer) but this is junk
My question is how to get userspace backtrace from kcore or task_struct.
First of all, vmcore is a immediate full memory snapshot, so it contains all pages (including user pages). But if the user pages are swapped out, they couldn't be accessed. So that is why kdump (and similar tools as your gdb python script) focused on kernel debugging functionality only. For userspace debugging and stacktraces you have to use coredump functionality. By default the coredumps are produced when kernel sends (for example) SIGSEGV to your app, but you can make them when you want by using gcore of modifying kernel. Also there is a "userspace" way of making process dump, see google coredumper project
Also, you can try to unwind user stacktrace directly from kcore - but this is a tricky way, and you will have to hope that userspace stacktrace is not swapped out at the moment. (do you use a swap?) You can see __save_stack_trace_user, it will make sense of how to retrieve userspace context
First of all vmcores typically don't contain user pages. I'm unaware of any magic which would help here - you would have to inspect vm mappings for given task address space and then inspect physical pages, and I highly doubt the debugger knows how to do it.
But most importantly you likely don't have any valid reason to do it in the first place.
So, what are you trying to achieve?
=======================
Given the edit:
some one from our application is making lot of munmap and making the
system hang(CPU soft lockup 22s!).
There may or may not be an actual kernel issue which must be debugged. I don't see any use for userspace stacktraces for this one though.
So as I understand presumed issue is excessive mmap + munmap calls from the application.Inspecting the backtrace of the thread reported with said lockup may or may not happen to catch the culprit. What you really want is to collect backtraces of /all/ callers and sort them by frequency. This can be done (albeit with pain) with systemtap.

FreeRTOS - Stack corruption on STM32F4

I am currently having problems with what I think is stack corruption of some error of configuration while running FreeRTOS on an STM32F407 target.
I have looked at FreeRTOS stack corruption on STM32F4 with gcc but got no help there.
The application runs two tasks and relies on one CAN interrupt. The workflow is as follows:
The two tasks, network_task and app_task is created along with two queues, raw_msg_queue and app_msg_queue. The CAN interrupt is also set up.
The network_task has the highest priority and starts waiting on the raw_msg_queue, indefinitely.
The app_task is next and starts waiting on the app_msg_queue.
The CAN interrupt then triggers because of an external event, adding a CAN message to the raw_msg_queue.
The network_task wakes up, process the message, adds the processed message to the app_msg_queue and then continues to wait on the raw_msg_queue.
The app_task wakes up and I get a hard fault.
The thing is that I have wrapped the calls that app_task makes to xQueueReceive in two steps because of end-user convenience and portability. The app_task total function chain is that it calls network_receive(..) -> os_queue_receive(..) -> xQueueReceive(..). This works well, but when it returns from xQueueReceive(..) it only manages to return to os_queue_receive(..) before it returns to a seemingly random memory location and i get a hard-fault.
The stack sizes should be adequate and are set to 2048 for both, all large data structures are passed around as pointers.
I am running my code on two STM32F407. FreeRTOS is at version 7.4.2, the latest at the time of writing.
I am really hoping that someone can help me out here!
First, you can take a look here and try to get more info about the hard fault.
You may also want to check your interrupt priority setting, as the tricky ARM Cortex-M interrupt priority mechanism causes some trouble in FreeRTOS. Refer to here.
I know this question is rather old, but perhaps this could help other people out facing a similar issue. In FreeRTOS, you can utilize the
void vApplicationStackOverflowHook(xTaskHandle xTask, signed char *pcTaskName)
function to detect a stack overflow and grab relevent information about the offending task. It's possible that data would be corrupt due to the overflow, but you can atleast address the fact that an overflow occured (reset system, set error flag/LED, etc.)
For this specific question, I'd be curious to see the thread initialization code as well as the interrupt routine. If the problem is in fact an overflow, I think it would be fairly simply to adjust these parameters until the problem goes away. You mention 2048 bytes should be sufficient for each thread - if that's truly the case, I doubt the problem is an overflow. At that point, it's more likely you're dereferencing a dangling pointer to a stale memory address.

Can you detect a debugger attached to your process using Div by Zero

Can you detect whether or not a debugger is attached to your native Windows process by using a high precision timer to time how long it takes to divide an integer by zero?
The rationale is that if no debugger is attached, you get a hard fault, which is handled by hardware and is very fast. If a debugger is attached, you instead get a soft fault, which is percolated up to the OS and eventually the debugger. This is relatively slow.
Since there is absolutely nothing you can do to prevent a determined person from reverse engineering your code, no clever approach you find will be significantly better than calling IsDebuggerPresent()
No. A sufficiently determined attacker would simply host your process in a VM and break in that way.
Besides, you don't need to attach a debugger to attack a program: grabbing a minidump will let an adversary inspect the memory state offline, or using process explorer you can inspect open handles to determine what files are vulnerable.
If you were going to use an exception to determine whether a naive debugger were attached, I'd personally use INT_MIN/-1 to trigger an integer overflow exception. Most don't know about that one.
most debuggers used by reverse engineers come with methods to affect (remove) 99% of the marks left by debuggers, most of these debuggers provided exception filtering, meaning the speed difference would be undetectable.
its more productive to prevent the debugger attaching in the first place, but in the long run you'll never come out ahead unless you make the required effort investment unfeasable.

Ptrace mprotect debugging trouble

I'm having trouble with an research project.
What i am trying to is to use ptrace to watch the execution of a target process.
With the help of ptrace i am injecting a mprotect syscall into the targets code segment (similar to a breakpoint) and set the stack protection to PROT_NONE.
After that i restore the original instructions and let the target continue.
When i get an invalid permisson segfault i again inject the syscall to unprotect the stack again and afterwards i execute the instruction which caused the segfault and protect the stack again.
(This does indeed work for simple programs.)
My problem now is, that with this setup the target (pretty) randomly crashes in library function calls (no matter whether i use dynamic or static linking).
By crashing i mean, it either tries to access memory which for some reason is not mapped, or it just keeps hanging in the function __lll_lock_wait_private (that was following a malloc call).
Let me emphasis again, that the crashes don't always happen and don't always happen at the same positions.
It kind of sounds like an synchronisation problem but as far as i can tell (meaning i looked into /proc/pid/tasks/) there is only one thread running.
So do you have any clue what could be the reason for this?
Please tell me your suggestions even if you are not sure, i am running out of ideas here ...
It's also possible the non-determinism is created by address space randomization.
You may want to disable that to try and make the problem more deterministic.
EDIT:
Given that turning ASR off 'fixes' the problem then maybe the under-lying problem might be:
Somewhere thinking 0 is invalid when it should be valid, or visaversa. (What I had).
Using addresses from one run against a different run?

Resources