Assembly procedures called and defined in C code

Assembly procedures called and defined in C code - c

I have the following assembly code:
global stuff
stuff:
;do stuff
I wish to call this from C code, so would it be able to be called from a C program which contains it in _asm()?

Just because Linux does it is not a reason to do something. I am baffled by the use/desire to use inline assembly, but yes, sure, there are ways to do this which you could easily figure out. If you are asking this you are not quite ready to write an operating system. Keep working on it though. You need C and tool basics before you begin. An operating system is essentially a big bare-metal program.
If you tagged this nasm then you are not interested in inline asm anyway, just use real asm gas or nasm.
This
int fun ( void )
{
return 5;
}
does/can become:
0000000000000000 <fun>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3
so that means I can do this
.globl fun
fun:
mov $0x5,%eax
retq
and this
#include <stdio.h>
int fun ( void );
int main ( void )
{
printf("%d\n",fun());
return 0;
}
and build a binary linking the two parts which prints 5 when run.
So then with nasm I can
global fun
fun:
mov eax,5
ret
confirming it is the same machine code in this case or at least an equivalent.
0000000000000000 <fun>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3 retq
so I can link that in instead and it prints 5 as well.
So now I can do a simple inline, very real asm like, perhaps what you were asking
#include <stdio.h>
int fun ( void );
asm(".globl fun ; fun: mov $0x5,%eax ; retq");
int main ( void )
{
printf("%d\n",fun());
return 0;
}
This was using gcc, inline asm is tool specific and not assumed to be portable.
And now you can grossly over complicate it from there.
Using an abstraction to perform I/O operations ("its basically writing byte x to port y") in an OS (or anywhere) is absolutely the right thing to do (you do not want to inline something like that), so a separate function be it real asm or C or some hybrid is a good idea worth pursuing. At the end of the day though for an access type function like that you need to be in complete control over the instruction used so however you choose to do that is up to you. But elementary use of tools and the language is a required before starting any kind of work like this. You can examine different operating systems that exist now as a reference, but this is yours not theirs, your personal preferences not someone else's, your knowledge of the language and tools and assumptions not someone else's. They may have a system level implementation of something that you may not see all of and can fall into traps by simply copying a piece here or there.

Related

What is in the address of main?

A simple piece of code like this
#include<stdio.h>
int main()
{
return 0;
}
check the value in "&main" with gdb，I got 0xe5894855, I wonder what's this?
(gdb) x/x &main
0x401550 <main>: 0xe5894855
(gdb)

(gdb) x/x &main
0x401550 <main>: 0xe5894855
(gdb)
0xe5894855 is hex opcodes of the first instructions in main, but since you used x/x now gdb is displaying it as just a hex number and is backwards due to x86-64 being little-endian. 55 is the opcode for push rbp and the first instruction of main. Use x/i &main to view the instructions.

check the value in "&main" with gdb，I got 0xe5894855, I wonder what's this?
The C expression &main evaluates to a pointer to (function) main.
The gdb command
x/x &main
prints (eXamines) the value stored at the address expressed by &main, in hexadecimal format (/x). The result in your case is 0xe5894855, but the C language does not specify the significance of that value. In fact, C does not define any strictly conforming way even to read it from inside the program.
In practice, that value probably represents the first four bytes of the function's machine code, interpreted as a four-byte unsigned integer in native byte order. But that depends on implementation details both of GDB of the C implementation involved.

Ok so the 0x401550 is the address of main() and the hex goo to the right is the "contents" of that address, which doesn't make much sense since it's code stored there, not data.
To explain what that hex goo is coming from, we can toy around with some artificial examples:
#include <stdio.h>
int main (void)
{
printf("%llx\n", (unsigned long long)&main);
}
Running this code on gcc x86_64, I get 401040 which is the address of main() on my particular system (this time). Then upon modifying the example into some ugly hard coding:
#include <stdio.h>
int main (void)
{
printf("%llx\n", (unsigned long long)&main);
printf("%.8x\n", *(unsigned int*)0x401040);
}
(Please note that accessing absolute addresses of program code memory like this is dirty hacking. It is very questionable practice and some systems might toss out an hardware exception if you attempt it.)
I get
401040
08ec8348
The gibberish second line is something similar to what gdb would give: the raw op codes for the instructions stored there.
(That is, it's actually a program that prints out the machine code used for printing out the machine code... and now my head hurts...)
Upon disassembly and generating a binary of the executable, then viewing numerical op codes with annotated assembly, I get:
main:
48 83 ec 08
401040 sub rsp,0x8
Where the 48 83 ec 08 is the raw machine code, including the instruction sub with its parameters (x86 assembler isn't exactly my forte, but I believe 48 is "REX prefix" and 83 is the op code for sub). Upon attempting to print this as if it was integer data rather than machine code, it got tossed around according to x86 little endian ordering from 48 83 ec 08 to 08 ec 83 48. And that's the hex gibberish 08ec8348 from before.

Why Interrupts not generates by C code but easy generates by assembly instructions?

I am programming a little kernel, and implement idt and interrupts.
This C code in my little kernel not generate any interrupt:
int x = 5/0;
int f[4];
f[5] = 8;
But this Assembly code can generate any interrupt:
asm("int $0");
(and handlers work right).
Help me to understand why this situation can happens.
I also tried this:
int a = 3;
int b = 3;
int c = a-b;
int x = a/c;
Nothing I try in c code can generate exception for me.
Even this not worked:
int div_by_0(int a, int b){return a/b;}
int x = div_by_0(5, 0);

void fun ( void )
{
int a = 3;
int b = 3;
int c = a-b;
int x = a/c;
}
Disassembly of section .text:
0000000000000000 <fun>:
0: f3 c3 repz retq
there is no divide to trigger a divide by zero. It is all dead code.
And none of this has anything to do with the int instruction, these are completely separate topics.
As mentioned in the comments test it without using dead code.
int fun0 ( int x )
{
return(5/x);
}
int fun1 ( void )
{
return(fun0(0));
}
but understand that it still may not have the desired effect:
Disassembly of section .text:
0000000000000000 <fun0>:
0: b8 05 00 00 00 mov $0x5,%eax
5: 99 cltd
6: f7 ff idiv %edi
8: c3 retq
9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000000010 <fun1>:
10: 0f 0b ud2
because the optimizer for fun1 could see the fun0 function. You want to have the code under test in a separate optimization domain. In this case above then the idiv would generate the divide by zero. And then it is becomes an operating system issue as to how that is handled and if it is visible to you.

The problem you are seeing is because division by 0 is undefined behaviour in C/C++. The compiler has managed to do enough optimization at compile time to realize you are dividing by zero. The compiler is free to do anything from things like halting and catching fire to making the result 0. Some compilers will emit a ud2 instruction to raise a CPU exception. The result is undefined.
You have a couple of options. Write your division in assembly and call that function from C/C++. Since you are using GCC (works for CLANG as well) You can also use inline assembly to generate a division by zero with something like:
#include <stdint.h> /* or replace uint16_t with unsigned short int */
void div_by_0 (void)
{
asm ("div %b0" :: "a"((uint16_t)0));
return;
}
This sets AX to 0 then divides AX by AL with the DIV instruction. 0/0 is undefined and will raise a Division Exception (#DE). This inline assembly should work with 16, 32, and 64-bit code.
In protected mode or long mode using int $# (Where # is the vector number) to trigger an exception is not always the same as getting a CPU generated exception. Some exceptions generated by the CPU push an error code on the stack after the return address that needs to be cleaned up by an interrupt handler. If you were to use int $0x0d from ring 0 to cause a #GP exception the interrupt handler would likely fault as it returns from the interrupt because using int to generate an exception never places an error code on the stack. This isn't a problem with int $0 because #DE doesn't have an error code placed on the stack by the CPU.

It turned out to be due to optimization flags. Due to a bit of confusion at Makefiles, the -O2 flag worked. If you enable the -O0 flag, exceptions work directly from C. And even this simple code throws an exceptions:
int x = 5/0;

Why does Windows require DLL data to be imported?

On Windows data can be loaded from DLLs, but it requires indirection through a pointer in the import address table. As a result, the compiler must know if an object that is being accessed is being imported from a DLL by using the __declspec(dllimport) type specifier.
This is unfortunate because it means a that a header for a Windows library designed to be used as either a static library or a dynamic library needs to know which version of the library the program is linking to. This requirement is not applicable to functions, which are transparently emulated for DLLs with a stub function calling the real function, whose address is stored in the import address table.
On Linux the dynamic linker (ld.so) copies the values of all linked data objects from a shared object into a private mapped region for each process. This doesn't require indirection because the address of the private mapped region is local to the module, so its address is decided when the program is linked (and in the case of position independent executables, relative addressing is used).
Why doesn't Windows do the same? Is there a situation where a DLL might be loaded more than once, and thus require multiple copies of linked data? Even if that was the case, it wouldn't be applicable to read only data.
It seems that the MSVCRT handles this issue by defining the _DLL macro when targeting the dynamic C runtime library (with the /MD or /MDd flag), then using that in all standard headers to conditionally declare all exported symbols with __declspec(dllimport). I suppose you could reuse this macro if you only supported statically linking when using the static C runtime and dynamically linking when using the dynamic C runtime.
References:
LNK4217 - Russ Keldorph's WebLog (emphasis mine)
__declspec(dllimport) can be used on both code and data, and its semantics are subtly different between the two. When applied to a routine call, it is purely a performance optimization. For data, it is required for correctness.
[...]
Importing data
If you export a data item from a DLL, you must declare it with __declspec(dllimport) in the code that accesses it. In this case, instead of generating a direct load from memory, the compiler generates a load through a pointer, resulting in one additional indirection. Unlike calls, where the linker will fix up the code correctly whether the routine was declared __declspec(dllimport) or not, accessing imported data requires __declspec(dllimport). If omitted, the code will wind up accessing the IAT entry instead of the data in the DLL, probably resulting in unexpected behavior.
Importing into an Application Using __declspec(dllimport)
Using __declspec(dllimport) is optional on function declarations, but the compiler produces more efficient code if you use this keyword. However, you must use `__declspec(dllimport) for the importing executable to access the DLL's public data symbols and objects.
Importing Data Using __declspec(dllimport)
When you mark the data as __declspec(dllimport), the compiler automatically generates the indirection code for you.
Importing Using DEF Files (interesting historical notes about accessing the IAT directly)
How do I share data in my DLL with an application or with other DLLs?
By default, each process using a DLL has its own instance of all the DLLs global and static variables.
Linker Tools Warning LNK4217
What happens when you get dllimport wrong? (seems to be unaware of data semantics)
How do I export data from a DLL?
CRT Library Features (documents the _DLL macro)

Linux and Windows use different strategies for accessing data stored in dynamic libraries.
On Linux, an undefined reference to an object is resolved to a library at link time. The linker finds the size of the object and reserves space for it in the .bss or the .rdata segment of the executable. When executed, the dynamic linker (ld.so) resolves the symbol to a dynamic library (again), and copies the object from the dynamic library to the process's memory.
On Windows, an undefined reference to an object is resolved to an import library at link time, and no space is reserved for it. When the module is executed, the dynamic linker resolves the symbol to a dynamic library, and creates a copy on write memory map in the process, backed by a shared data segment in the dynamic library.
The advantage of a copy on write memory map is that if the linked data is unchanged, then it can be shared with other processes. In practice this is a trifling benefit which greatly increases complexity, both for the toolchain and programs using dynamic libraries. For objects which are actually written this is always less efficient.
I suspect, although I have no evidence, that this decision was made for a particular and now outdated use case. Perhaps it was common practice to use large (for the time) read only objects in dynamic libraries on 16-bit Windows (in official Microsoft programs or otherwise). Either way, I doubt anyone at Microsoft has the expertise and time to change it now.
In order to investigate the issue I created a program which writes to an object from a dynamic library. It writes one byte per page (4096 bytes) in the object, then writes the entire object, then retries the initial one byte per page write. If the object is reserved for the process before main is called, the first and third loops should take approximately the same time, and the second loop should take longer than both. If the object is a copy on write map to a dynamic library, the first loop should take at least as long as the second, and the third should take less time than both.
The results are consistent with my hypothesis, and analyzing the disassembly confirms that Linux accesses the dynamic library data at a link time address, relative to the program counter. Surprisingly, Windows not only indirectly accesses the data, the pointer to the data and its length are reloaded from the import address table every loop iteration, with optimizations enabled. This was tested with Visual Studio 2010 on Windows XP, so maybe things have changed, although I wouldn't think that it has.
Here are the results for Linux:
$ dd bs=1M count=16 if=/dev/urandom of=libdat.dat
$ xxd -i libdat.dat libdat.c
$ gcc -O3 -g -shared -fPIC libdat.c -o libdat.so
$ gcc -O3 -g -no-pie -L. -ldat dat.c -o dat
$ LD_LIBRARY_PATH=. ./dat
local = 0x1601060
libdat_dat = 0x601040
libdat_dat_len = 0x601020
dirty= 461us write= 12184us retry= 456us
$ nm dat
[...]
0000000000601040 B libdat_dat
0000000000601020 B libdat_dat_len
0000000001601060 B local
[...]
$ objdump -d -j.text dat
[...]
400693: 8b 35 87 09 20 00 mov 0x200987(%rip),%esi # 601020 <libdat_dat_len>
[...]
4006a3: 31 c0 xor %eax,%eax # zero loop counter
4006a5: 48 8d 15 94 09 20 00 lea 0x200994(%rip),%rdx # 601040 <libdat_dat>
4006ac: 0f 1f 40 00 nopl 0x0(%rax) # align loop for efficiency
4006b0: 89 c1 mov %eax,%ecx # store data offset in ecx
4006b2: 05 00 10 00 00 add $0x1000,%eax # add PAGESIZE to data offset
4006b7: c6 04 0a 00 movb $0x0,(%rdx,%rcx,1) # write a zero byte to data
4006bb: 39 f0 cmp %esi,%eax # test loop condition
4006bd: 72 f1 jb 4006b0 <main+0x30> # continue loop if data is left
[...]
Here are the results for Windows:
$ cl /Ox /Zi /LD libdat.c /link /EXPORT:libdat_dat /EXPORT:libdat_dat_len
[...]
$ cl /Ox /Zi dat.c libdat.lib
[...]
$ dat.exe # note low resolution timer means retry is too small to measure
local = 0041EEA0
libdat_dat = 1000E000
libdat_dat_len = 1100E000
dirty= 20312us write= 3125us retry= 0us
$ dumpbin /symbols dat.exe
[...]
9000 .data
1000 .idata
5000 .rdata
1000 .reloc
17000 .text
[...]
$ dumpbin /disasm dat.exe
[...]
004010BA: 33 C0 xor eax,eax # zero loop counter
[...]
004010C0: 8B 15 8C 63 42 00 mov edx,dword ptr [__imp__libdat_dat] # store data pointer in edx
004010C6: C6 04 02 00 mov byte ptr [edx+eax],0 # write a zero byte to data
004010CA: 8B 0D 88 63 42 00 mov ecx,dword ptr [__imp__libdat_dat_len] # store data length in ecx
004010D0: 05 00 10 00 00 add eax,1000h # add PAGESIZE to data offset
004010D5: 3B 01 cmp eax,dword ptr [ecx] # test loop condition
004010D7: 72 E7 jb 004010C0 # continue loop if data is left
[...]
Here is the source code used for both tests:
#include <stdio.h>
#ifdef _WIN32
#include <windows.h>
typedef FILETIME time_l;
time_l time_get(void) {
FILETIME ret; GetSystemTimeAsFileTime(&ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->dwLowDateTime/100-c1->dwLowDateTime/100+c2->dwHighDateTime*100000-c1->dwHighDateTime*100000;
}
#else
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
typedef struct timespec time_l;
time_l time_get(void) {
time_l ret; clock_gettime(CLOCK_MONOTONIC, &ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->tv_nsec/1000-c1->tv_nsec/1000+c2->tv_sec*1000000-c1->tv_sec*1000000;
}
#endif
#ifndef PAGESIZE
#define PAGESIZE 4096
#endif
#ifdef _WIN32
#define DLLIMPORT __declspec(dllimport)
#else
#define DLLIMPORT
#endif
extern DLLIMPORT unsigned char volatile libdat_dat[];
extern DLLIMPORT unsigned int libdat_dat_len;
unsigned int local[4096];
int main(void) {
unsigned int i;
time_l t1, t2, t3, t4;
long long int d1, d2, d3;
t1 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t2 = time_get();
for(i=0; i < libdat_dat_len; i++) {
libdat_dat[i] = 0xFF;
}
t3 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t4 = time_get();
d1 = time_diff(&t1, &t2);
d2 = time_diff(&t2, &t3);
d3 = time_diff(&t3, &t4);
printf("%-15s= %18p\n%-15s= %18p\n%-15s= %18p\n", "local", local, "libdat_dat", libdat_dat, "libdat_dat_len", &libdat_dat_len);
printf("dirty=%9lldus write=%9lldus retry=%9lldus\n", d1, d2, d3);
return 0;
}
I sincerely hope someone else benefits from my research. Thanks for reading!

Finding variable name from instruction pointer using debugging symbols

I'm looking for a way to find the names of the variables accessed by a given instruction (that performs a memory access).
Using debugging symbols and, for example, addr2line or objdump it's easy to convert instruction addresses into source code files + line numbers, but unfortunately often a single source code line contains more than one variable so this method does not have sufficiently fine granularity.
I've found that objdump is able to convert instruction addresses to global variables. But I haven't yet found a way to do this for local variables. For example, in the example bellow, I'd like to know that instruction at address 0x4004c4 is accessing the local variable "local_hello" and that the instruction at address 0x4004c9 is accessing the local variable "local_hello2".
Hello.c:
int global_hello = 4;
int main(){
int local_hello = 3;
int local_hello2 = 0;
local_hello2 = global_hello + local_hello;
return local_hello2;
}
Using "objdump -S hello":
local_hello2 = global_hello + local_hello;
4004be: 8b 15 cc 03 20 00 mov 0x2003cc(%rip),%edx # 600890 <global_hello>
4004c4: 8b 45 fc mov -0x4(%rbp),%eax
4004c7: 01 d0 add %edx,%eax
4004c9: 89 45 f8 mov %eax,-0x8(%rbp)

This might work for simple programs with no or only moderate optimization levels but will become difficult with compiler optimzation.
You might want to look into gdb sources to learn about the efforts to connect variables to optimized compiler output.
What's your objective, after all?

Short-circuiting on boolean operands without side effects

For the bounty: How can this behavior can be disabled on a case-by-case basis without disabling or lowering the optimization level?
The following conditional expression was compiled on MinGW GCC 3.4.5, where a is a of type signed long, and m is of type unsigned long.
if (!a && m > 0x002 && m < 0x111)
The CFLAGS used were -g -O2. Here is the corresponding assembly GCC output (dumped with objdump)
120: 8b 5d d0 mov ebx,DWORD PTR [ebp-0x30]
123: 85 db test ebx,ebx
125: 0f 94 c0 sete al
128: 31 d2 xor edx,edx
12a: 83 7d d4 02 cmp DWORD PTR [ebp-0x2c],0x2
12e: 0f 97 c2 seta dl
131: 85 c2 test edx,eax
133: 0f 84 1e 01 00 00 je 257 <_MyFunction+0x227>
139: 81 7d d4 10 01 00 00 cmp DWORD PTR [ebp-0x2c],0x110
140: 0f 87 11 01 00 00 ja 257 <_MyFunction+0x227>
120-131 can easily be traced as first evaluating !a, followed by the evaluation of m > 0x002. The first jump conditional does not occur until 133. By this time, two expressions have been evaluated, regardless of the outcome of the first expression: !a. If a was equal to zero, the expression can (and should) be concluded immediately, which is not done here.
How does this relate to the the C standard, which requires Boolean operators to short-circuit as soon as the outcome can be determined?

The C standard only specifies the behavior of an "abstract machine"; it does not specify the generation of assembly. As long as the observable behavior of a program matches that on the abstract machine, the implementation can use whatever physical mechanism it likes for implementing the language constructs. The relevant section in the standard (C99) is 5.1.2.3 Program execution.

It is probably a compiler optimization since comparing integral types has no side effects. You could try compiling without optimizations or using a function that has side effects instead of the comparison operator and see if it still does this.
For example, try
if (printf("a") || printf("b")) {
printf("c\n");
}
and it should print ac

As others have mentioned, this assembly output is a compiler optimization that doesn't affect program execution (as far as the compiler can tell). If you want to selectively disable this optimization, you need to tell the compiler that your variables should not be optimized across the sequence points in the code.
Sequence points are control expressions (the evaluations in if, switch, while, do and all three sections of for), logical ORs and ANDs, conditionals (?:), commas and the return statement.
To prevent compiler optimization across these points, you must declare your variable volatile. In your example, you can specify
volatile long a;
unsigned long m;
{...}
if (!a && m > 0x002 && m < 0x111) {...}
The reason that this works is that volatile is used to instruct the compiler that it can't predict the behavior of an equivalent machine with respect to the variable. Therefore, it must strictly obey the sequence points in your code.

The compiler's optimising - it gets the result into EBX, moves it to AL, part of EAX, does the second check into EDX, then branches based on the comparison of EAX and EDX. This saves a branch and leaves the code running faster, without making any difference at all in terms of side effects.
If you compile with -O0 rather than -O2, I imagine it will produce more naive assembly that more closely matches your expectations.

The code is behaving correctly (i.e., in accordance with the requirements of the language standard) either way.
It appears that you're trying to find a way to generate specific assembly code. Of two possible assembly code sequences, both of which behave the same way, you find one satisfactory and the other unsatisfactory.
The only really reliable way to guarantee the satisfactory assembly code sequence is to write the assembly code explicitly. gcc does support inline assembly.
C code specifies behavior. Assembly code specifies machine code.
But all this raises the question: why does it matter to you? (I'm not saying it shouldn't, I just don't understand why it should.)
EDIT: How exactly are a and m defined? If, as you suggest, they're related to memory-mapped devices, then they should be declared volatile -- and that might be exactly the solution to your problem. If they're just ordinary variables, then the compiler can do whatever it likes with them (as long as it doesn't affect the program's visible behavior) because you didn't ask it not to.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Assembly procedures called and defined in C code - c

I have the following assembly code: global stuff stuff: ;do stuff I wish to call this from C code, so would it be able to be called from a C program which contains it in _asm()?

Related

What is in the address of main?

Why Interrupts not generates by C code but easy generates by assembly instructions?

Why does Windows require DLL data to be imported?

Finding variable name from instruction pointer using debugging symbols

Short-circuiting on boolean operands without side effects

Categories

Resources