C traps and pitfalls 2.1
I thought 0 is always invalid address. How could he put a function in that position?
It's architecture dependent.
From the book:
I
once
talked
to
someone
who
was
writing
a
C
program
that
was
going
to
run
stand-alone
in
a
small
microprocessor (answer right here).
When
this
machine
was
switched
on,
the
hardware
would
call
the
subroutine
whose
address
was
stored
in
location
0.
In
order
to
simulate
turning
power
on,
we
had
to
devise
a
C
statement
that
would
call
this
subroutine
explicitly.
After
some
thought,
we
came
up
with
the
following:
(*(void(*)())0)();
For microprocessors/microcontrollers, you have raw access to any RAM/Flash Address unless prohibited in hardware. Therefore accessing address 0 in microprocessor is completely vaild.
I think that (* (void (*)()) 0) means that it is trying to invoke a function that is located in memory at address 0x00000000(which probably is an invalid address)
A very similar question on stackoverflow What does this C statement mean? may help
Related
I was wondering what is the equivalent of the command Peek and Poke (Basic and other variants) in C. Something like PeekInt, PokeInt (integer).
Something that deals with memory banks, I know in C there are lots of ways to do this and I am trying to port a Basic program to C.
Is this just using pointers? Do I have to use malloc?
Yes, you will have to use pointers. No, it does not necessarily involve malloc.
If you have a memory address ADDRESS and you want to poke a byte value into it, in C that's
char *bytepointer = ADDRESS;
*bytepointer = bytevalue;
As a concrete example, to poke the byte value 0h12 to memory address 0h3456, that would look like
char *bytepointer = 0x3456;
*bytepointer = 0x12;
If on the other hand you want to poke in an int value, that looks like
int *intpointer = ADDRESS;
*intpointer = intvalue;
"Peeking" is similar:
char fetched_byte_value = *bytepointer;
int fetched_int_value = *intpointer;
What's happening here is basically that these pointer variables like bytepointer and intpointer contain memory addresses, and the * operator accesses the value that the pointer variable points to. And this "access" can be either for the purpose of fetching (aka "peeking"), or storing ("poking").
See also question 19.25 in the C FAQ list.
Now, with that said, I have two cautions for you:
Pointers are an important but rather deep concept in C, and they can be tricky to understand at first. If you've never used a pointer, if you haven't gotten to the Pointers chapter in a good C textbook, the superficial introduction I've given you here isn't going to be enough, and there are all sorts of pitfalls waiting for you if you don't learn about them first.
Setting a pointer to point to an absolute, numeric memory address, and then accessing that location, is a low-level, high-powered technique that ends up being rather rare these days. It's useful (and still commonly used) if you're doing embedded programming, if you're using a microcontroller to access some hardware, or perhaps if you're writing an operating system. In more ordinary, "applications" level programming, on the other hand, there just aren't any memory addresses which you're allowed to access in this way.
What values do you want to peek or poke, to accomplish what? What is that Basic program supposed to do? If you're trying to, for example, poke values into display memory so they show up on the screen, beware that on a modern operating system (that is, anything newer than MS-DOS), it doesn't work that way any more.
Yes, its exactly pointers.
to read from the location pointed by some pointer code value = *pointer. (peek)
to write to that location, code *pointer = value. (poke)
notice the asterisk sign before pointer.
malloc is used when you need to allocate runtime memory in heap, but its not the only way to get a memory pointer. you can get a pointer simply by using & before name of a variable.
int value = 2;
int *myPointer = &value;
remember to free any memory allocated with malloc after you are done with it. See also the valgrind tool.
also I recommend to stick with non malloc solution because of its simplicity and easy maintenance.
you can call a function many time to allocate new memory for its local variables and work with them.
The concept of peek/poke (to directly access memory) in C is irrelevant - you probably don't need them and if you do you really should ask a specific question about the code you are trying to write. If you are building 32 or 64 bit code for a modern platform, peek() / poke() are largely irrelevant in any event and unlikely to do what you expect - unless you expect a memory protection fault! You will be digging out some antique compiler if you are building 16-bit x86 "real-mode" code.
C is a systems-level language and as such operating directly on memory fundamental to the language. It is also a compiled rather than interpreted language and you would normally access a memory location through a variable/symbol and let the linker resolve/locate its address. However in some cases (in embedded systems code or kernel level device drivers for example) you might need to access a specific physical address which you might do as follows:
#include <stdint.h>
const uint16_t* const KEYBOARD_STATE_FLAGS_ADDR = ((uint16_t*)0x0417) ;
...
// Peek keyboard state
keyboard_flags = *KEYBOARD_STATE_FLAGS_ADDR ;
If you really want functions to observe/modify arbitrary address locations then for example:
uint16_t peek16( uint32_t addr )
{
return *((uint16_t*)addr) ;
}
void poke16( uint32_t addr, uint16_t value )
{
*((uint16_t*)addr) = value ;
}
... and perhaps corresponding implementations for 8, 32 and even 64 bit access. However good luck avoiding a SEG-FAULT exception on a modern system where memory is virtualised and protected - as I said it largely makes no sense.
Something that deals with memory banks,
I assume given the [qbasic] tag that you are referring to 16-bit x86 segment:offset addressing rather than "memory banks" - that is an architecture specific thing, irrelevant on modern systems. If you really need to deal with that (i.e. are targeting 16-bit x86 C code), you need to mention that - it is a very different issue, specific to a particular obsolete architecture. You would also need to specify what compiler you are using, because it involves non-standard compiler extensions (and to be frank, techniques I have long forgotten; it would be software archaeology).
Is this just using pointers?
Essentially yes, in C pointers are addresses, but in 16-Bit x86 code there is the added complication of the segmented memory architecture and the concept of near and far pointers. If you are porting this code to a modern system, it is unlikely to be a simple matter of porting peek/poke commands verbatim - they are unlikely to work. If you are are just translating 16-bit QBasic code to 16-bit C - why?!
Do I have to use malloc?
No - or at least that depends on the semantics of the code you are porting, but unlikely, and it is not relevant to peek/poke specifically. malloc allocates memory from a system heap and provides its address - which is non-deterministic. peek/poke are a means of accessing specific memory addresses.
In the end it is probably not a matter of directly implementing the peek/poke commands in the code you are porting, and whether that would work or not would depend on the platform and architecture you are porting the code to. Most likely you would be better off looking at the higher level semantics of the code you are porting and implementing that in a manner that suits the platform you are targeting or better be platform independent and therefore portable.
Basically when porting you should ask your self what does this bit of code do, and port that rather than a line-by-line translation. And that consideration should be done at as high a level as possible to make the best use of the target language an/or its library. peek/poke were typically used for either accessing machine features no exposed through some higher lever interface, or for directly accessing hardware or video-buffers. On a modern platform much of that is either unnecessary or won't work - and probably both.
This question already has answers here:
What does C expression ((void(*)(void))0)(); mean?
(5 answers)
Closed 5 years ago.
I was trying to write my own boot loader on Atmel AVR Microcontroller. I have referred one of the code base from github. I would like to thank for the code base to ZEVERO
At primary level I understand the code base. but at line 224 I found a line
Reference to the code
**if (pgm_read_word(0) != 0xFFFF) ((void(*)(void))0)(); //EXIT BOOTLOADER**
I understand the if condition part but when I was trying to understand the true statement part i.e.
**((void(*)(void))0)();**
code writer has given explanation to this is //EXIT BOOTLOADER
My first Question is what is the meaning of this complex declaration
**((void(*)(void))0)();**
And Second Question is, does it Exit the execution of the code in Microcontroller.
As #iBug pointed out, ((void(*)(void))0)(); invokes a function call on a NULL function pointer.
In effect, that transfers program control to memory address 0. Now, on a workstation, that would be colossal UB, most likely resulting in a segfault.
However, since the code in question is for a hardware bootloader, it's not UB, it (apparently) just exits the bootloader.
At the hardware level, almost everything is implementation dependent, and almost nothing is portable. You can't expect C code targeted at a specific hardware platform to be in any way representative of generally-accepted C patterns and practices.
((void(*)(void))0)(); tries to call a NULL function pointer. User programs (not bootloaders) for AVR microcontrollers usually start execution at address 0. AVR-GCC's ABI uses an all-0-bit representation of NULL function pointers, so this call will (among other things) transfer execution to the user program. Essentially, it works as a slower version of __asm__ __volatile__("jmp 0");, and assumes that the user program's startup code will reinitialize the stack pointer anyway.
Calling through a NULL function pointer is undefined behavior, so there's no guarantee that this trick will work with other compilers, later versions of GCC, or even different optimization settings.
The if (pgm_read_word(0) != 0xFFFF) check before the call is probably to determine if a user program is present: program memory words that have been erased but not written will read as 0xFFFF, while most programs start with a JMP instruction to skip over the rest of the interrupt vector table, and the first word of a JMP instruction is never 0xFFFF.
As has been pointed out before, calling this function simply results in a jump to address 0.
As the code at this address is typically not defined by your own program, but rather by the specific environment, behavior totally depends on this environment.
Your question is tagged as AVM/Atmel: on AVRs, jumping to address 0 simply results in a restart (nearly same behavior as a hardware reset, but beware, the MCU will keep the interrupt enabled/disabled state as opposed to a "real" reset). A "cleaner" program might probably want to use the watchdog timer for a "real" reset (wdt_reset() et al).
It will simply call the address 0 as if it was a function returning void and taking no arguments. Or... less simply the address that is the bit pattern of the null pointer. Or even less simply, the behaviour is undefined so it might do anything unexpected.
I found a C code that looks like this:
#include <stdio.h>
char code[] =
"\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x46\x61\x74\x61\x75\xf2\x81\x7e"
"\x08\x45\x78\x69\x74\x75\xe9\x8b\x7a\x24\x01\xc7\x66\x8b\x2c"
"\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf\xfc\x01\xc7\x68\x72\x6c"
"\x64\x01\x68\x6c\x6f\x57\x6f\x68\x20\x48\x65\x6c\x89\xe1\xfe"
"\x49\x0b\x31\xc0\x51\x50\xff\xd7";
int main(void)
{
int (*func)();
func = (int(*)()) code;
(int)(*func)();
return 0;
}
For the given HEX CODE this program runs well and printing ("HelloWorld"). I was thinking that the HEX CODE is some machine instructions and by calling a function pointer that's pointing to that CODE we are executing that CODE.
Was my thought right? is there something to improve it?
How this HEX CODE gets generated?
Tanks for advance.
You are correct that by forcing a function pointer like this you are calling into machine instructions written as a hexadecimal string variable.
I doubt that a program like this would work on any CPU since about 2005.
On most RISC CPUs (like ARM) and on all Intel and AMD CPUs that support 64-bit, memory pages have a No Execute bit. Or in reverse an Execute bit.
On memory pages that do not have an Execute bit, the CPU will not run code. Compilers do not put variables into executable memory pages.
In order to run injected shell codes, attackers now have to use "return into libc" or function pointer overwrite attacks which set things up to call mprotect or VirtualProtect to set the execute bit on their shell code. Either that or get it injected into a executable space such as the Java, .NET, or Javascript JIT compiler uses.
Security hardened kernels will deny the ability to call mprotect. Once the program's address space is set by the dynamic library loader, it sets a security flag and no new executable pages can be created.
In order to make it always work you could assign some executable_readwrite space with malloc or the like and put the code in there and then execute it. Then there won't be any access violation faults.
void main(int argc, char** argv)
{
void* PointerToNewMemoryRegion=0;
void (*FunctionPointer) ();
PointerToNewMemoryRegion=VirtualAlloc(RandomPointer,113,MEM_COMMIT | MEM_RESERVE,PAGE_EXECUTE_READWRITE);
if (PointerToNewMemoryRegion == NULL)
{
std::cout<<"Failed to Allocate Memory region Error code: "<<GetLastError();
return;
}
memcpy(PointerToNewMemoryRegion, code,113);
FunctionPointer = (void(*)()) PointerToNewMemoryRegion;
(void)(*FunctionPointer) ();
VirtualFree(PointerToNewMemoryRegion,113,MEM_DECOMMIT)
}
but the code never returns to my code to execute so my last line is pointless. So my code has a memory leak.
To ask this question from a "general C" point of view isn't all that meaningful.
First of all, your code has many major problems:
The literal "\xFF\xFF\xFF" equals 0xFFFFFF00, not 0x00FFFFFF as may or may not have been the intention.
What this hex code means and if it is at all meaningful, is endian-dependent and also depends on the address bus width of the given CPU.
As others have mentioned, casts between function pointers and regular pointers isn't supported or well-defined by C, the C standard lists it as a "common extension".
That being said, code like this has about one single purpose, and that is various forms of boot loaders and self-updating software used in embedded systems.
Suppose for example that you have a boot loader program that is tasked with re-programming something in the very same segment of flash memory where said program itself is executed from. That is impossible because of the way the memory hardware works. So in order to do so, you would have to execute the actual flash programming routine from RAM. Since the array of hex gibberish is stored in RAM, the program can execute from there with the function pointer trick, assuming that the C compiler has a non-standard extension that allows the cast.
As for how to generate the code, you either write it all in assembler and then translate the assembler instructions to op codes manually (very tedious). Or more likely, you write the function in C and then disassemble it and copy/paste the op codes from the disassembly.
The latter is more dangerous though, as the critical part of getting code like this to work is calling convention: you must be absolutely sure that the function stacks/unstacks things properly when it is called and when it is done, restoring the contents of any CPU registers used etc. Which may force you to write part of the function in assembler anyhow. Needless to say, the code will be completely non-portable.
I am trying to understand some Linux kernel driver code written in C for a USB Wi-Fi adapter. Line 1456 in file /drivers/net/wireless/rtl818x/rtl8187/dev.c (just in case anyone wanted to refer to the kernel code for context) reads:
priv->map = (struct rtl818x_csr *)0xFF00;
I am curious about what exactly the right operand is doing here - (struct rtl818x_csr *)0xFF00;. I have been interpreting this as saying "cast memory address 0xFF00 to be of type rtl818x_csr and then assign it to priv->map". If my interpretation is correct, what is so special about memory address 0xFF00 that the driver can reliably tell that what it is after will always be at this address? The other thing I am curious about is that 0xFF00 is only 16-bits. I would be expecting 32/64-bits if it were casting a memory address.
Can anyone clarify exactly what is going on in this line of code? I imagine there's a flaw in my understanding of the C syntax.
0xFF00 is an address in the IO address space of the system. If you look in the code, the address is never directly dereferenced but accessed through IO functions.
For example, in the call
rtl818x_iowrite8(priv, &priv->map->EEPROM_CMD,
RTL818X_EEPROM_CMD_CONFIG);
which then calls Linux kernel low level IO functions.
The address is cast to a pointer to a struct to give access to offsets from the adress, example here:
0xFF00 + offsetof(struct rtl818x_csr, EEPROM_CMD)
Note that in the rtl818x_iowrite8 call above, no dereference occurs when passing the &priv->map->EEPROM_CMD argument because of the & operator, only the address + offset is computed. The dereference is further achieved withtin the internal low level functions called inside rtl818x_iowrite8.
Casting an absolute address to a pointer to a structure is a common way in drivers to access the (memory mapped) registers of a device as a normal C structure.
Using 0xff00 works because C doesn't do sign extension of numbers.
You have to consider this from the device point of view.
Starting at address 0xFF00 inside the address space mapped for the rtl8187 device is a memory range that holds information structured the same way as the rtl818x_csr struct defined here.
So after you logically map that region you can start doing bus reads and writes on it to control the device. Like here (had to cut two more hyperlinks because I don't have the reputation necessary to post more than 3, but you get the point). These are just a couple of examples. If you read the entire file you'll see reads and writes are sprinkled everywhere.
In order to understand why that structure looks that way and why 0xFF00 is used instead of 0xBEEF or 0xDEAD you'll have to consult the datasheet for that device.
So if you want to start looking at kernel code, and specially device drivers, you'll have to have more than just the code. You'll need the datasheet or specifications as well. This can be rather difficult to find (see the gazillions of email threads and articles soliciting open documentation from the vendors).
Anyway, I hope I answered your question.
Happy hacking!
i have the following code:
void print(const char* str){
system_call(4,1,str,strlen(str)); }
void foo2(void){ print("goo \n");}
void buz(void){ ...}
int main(){
char buf[256];
void (*func_ptr)(void)=(void(*)(void))buf;
memcpy(buf,foo2, ((void*)buz)-((void*)foo2));
func_ptr();
return 0;
}
the question is, why will this code fall?
the answer was, something about calling a function not via pointer is to a relative address, but i havent been able to figure out whats wrong here? which line is the problematic one?
thank you for your help
Well to begin with, there is nothing which says that foo2() and buz() must be next to each other in memory. And for another, as you guess, the code must be relative for stunts like that to work. But most of all, it is not allowed by the standard.
As Chris Luts referred to, stack (auto) variables are not executable on many operating systems, to protect from attacks.
The first two lines in your main() function are problematic.
Line 1. (void(*)(void))buf
converting buf to a function pointer is undefined
Line 2. ((void*)buz)-((void*)foo2)
subtraction of pointers is undefined unless the pointers point within the same array.
Also, Section 5.8 Functions of H&S says "Although a pointer to a function is often assumed to be the address of the function's code in memory, on some computers a function pointer actually points to a block of information needed to invoke the function."
First and foremost, C function pointers mechanism is for equal-signature function calling abstraction. This is powerful and error-prone enough without these stunts.
I can't see an advantage/sense in trying to copying code from one place to another. As some have commented, it's not easy to tell the amount of relativeness/rellocatable code within a C function.
You tried copying the code of a function onto a data memory region. Some microcontrollers would just told you "Buzz off!". On machine architectures that have data/program separated memories, given a very understanding compiler (or one that recognizes data/code modifiers/attributes), it would compile to the specific Code-Data Move instructions. It seams it would work... However, even in data/code separated memory archs, data-memory instruction execution is not possible.
On the other hand, in "normal" data/code shared memory PCs, likely it would also not work because data/code segments are declared (by the loader) on the MMU of the processor. Depending on the processor and OS, attempts to run code on data segments, is a segmentation fault.