How do I use C libraries in assembler? - c

I want to know how to write a text editor in assembler. But modern operating systems require C libraries, particularly for their windowing systems. I found this page, which has helped me a lot.
But I wonder if there are details I should know. I know enough assembler to write programs that will use windows in Linux using GTK+, but I want to be able to understand what I have to send to a function for it to be a valid input, so that it will be easier to make use of all C libraries. For interfacing between C and x86 assembler, I know what can be learned from this page, and little else.

One of the most instructive ways to learn how to call C from assembler is to:
Write a C program that calls the C function of interest
Compile it, and look at the assembly listing (gcc -S)
This approach makes it easy to experiment by starting with something that is already known to work. You can change the C source and see how the generated code changes, and you can start with the generated code and modify it yourself.

push parameter on the stack
call the function
clear the stack
The links you have in your question show all these steps.

The OS may define the calling standard (it pretty well must define the standard for invoking system calls), in which case you need only find where that is documents and read it closely.

Related

How can I know where function ends in memory(get the address)- c/c++

I'm looking for a simple way to find function ending in memory. I'm working on a project that will find problems on run time in other code, such as: code injection, viruses and so fourth. My program will run with the code that is going to be checked on run time, so that I will have access to memory. I don't have access to the source code itself. I would like to examine only specific functions from it. I need to know where functions start and end in stack. I'm working with windows 8.1 64 bit.
In general, you cannot find where the function is ending in memory, because the compiler could have optimized, inlined, cloned or removed that function, split it in different parts, etc. That function could be some system call mostly implemented in the kernel, or some function in an external shared library ("outside" of your program's executable)... For the C11 standard (see n1570) point of view, your question has no sense. That standard defines the semantics of the language, i.e. properties on the behavior of the produced program. See also explanations in this answer.
On some computers (Harvard architecture) the code would stay in a different memory, so there is no point in asking where that function starts or ends.
If you restrict your question to a particular C implementation (that is a specific compiler with particular optimization settings, for a specific operating system and instruction set architecture and ABI) you might (in some cases, not in all of them) be able to find the "end of a function" (but that won't be simple, and won't be failproof). For example, you could post-process the assembler code and/or the object file produced by the compiler, inspect the ELF executable and its symbol table, examine DWARF debug information, etc...
Your question smells a lot like some XY problem, so you should motivate it, whith a lot more explanation and context.
I need to know where functions start and end in stack.
Functions don't sit on the stack, but mostly in the code segment of your executable (or library). What is on the call stack is a sequence of call frames. The organization of the call frames is specific to your ABI. Some compiler options (e.g. -fomit-frame-pointer) would make difficult to explore the call stack (without access to the source code and help from the compiler).
I don't have access to the source code itself. I would like to examine only specific functions from it.
Your problem is still ill-defined, probably undecidable, much more complex than what you believe (since related to the halting problem), and there is considerable literature related to it (read about decompiler, static code analysis, anti-virus & malware analysis). I recommend spending several months or years learning more about compilers (start with the Dragon Book), linkers, instruction set architecture, ABIs. Then look into several proceedings of conferences related to ACM SIGPLAN etc. On a practical side, study the assembler code generated by compilers (e.g. use GCC with gcc -O2 -S -fverbose-asm....); the CppCon 2017 talk: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” is a nice introduction.
I'm working on a project that will find problems on run time in other code, such as: code injection, viruses and so fourth.
I hope you can dedicate several years of full time work to your ambitious project. It probably is much more difficult than what you thought, because optimizing compilers are much more complex than what you believe (and malware software uses various complex tricks to hide itself from inspection). Malware research is really difficult, but interesting.

Hiding Lua Source Code In C Application

I am currently developing a game in C and Lua. Because I plan to sell my game when it is finished, I would like to keep the source code closed. So my question is whether there is a way I can hide, or somehow access my Lua code from C, without the user being able to look. Right now, my executable is placed in the same place as my Lua code so it can be accessed.
Thanks for reading this, and any help is appreciated. Please ask me for more details if I am being too vague.
I think the correct answer is, that you can't. You can only make life harder for the cracker. Better protection schemes than compiling code into bytecode have been cracked. If your game does not prove popular, it won't matter anyway. Write the game first, then worry about hiding your code.
Lua manual says:
[Lua] Chunks can also be precompiled into binary form; see program luac for details. Programs in source and compiled forms are interchangeable; Lua automatically detects the file type and acts accordingly.
This means you can use luac (Lua compiler) to compile your Lua code to binary form, which will not be easily readable, but can still be disassembled to find out what it does (which can be done even with C if you are determined enough).

hidden routines linked in c program

Hullo,
When one disasembly some win32 exe prog compiled by c compiler it
shows that some compilers links some 'hidden' routines in it -
i think even if c program is an empty one and has a 5 bytes or so.
I understand that such 5 bytes is enveloped in PE .exe format but
why to put some routines - it seem not necessary for me and even
somewhat annoys me. What is that? Can it be omitted? As i understand
c program (not speaking about c++ right now which i know has some
initial routines) should not need such complementary hidden functions..
Much tnx for answer, maybe even some extended info link, cause this
topic interests me much
//edit
ok here it is some disasembly Ive done way back then
(digital mars and old borland commandline (i have tested also)
both make much more code, (and Im specialli interested in bcc32)
but they do not include readable names/symbols in such dissassembly
so i will not post them here
thesse are somewhat readable - but i am not experienced in understending
what it is ;-)
https://dl.dropbox.com/u/42887985/prog_devcpp.htm
https://dl.dropbox.com/u/42887985/prog_lcc.htm
https://dl.dropbox.com/u/42887985/prog_mingw.htm
https://dl.dropbox.com/u/42887985/prog_pelles.htm
some explanatory comments whats that heere?
(I am afraid maybe there is some c++ sh*t here, I am
interested in pure c addons not c++ though,
but too tired now to assure that it was compiled in c
mode, extension of compiled empty-main prog was c
so I was thinking it will be output in c not c++)
tnx for longer explanations what it is
Since your win32 exe file is a dynamically linked object file, it will contain the necessary data needed by the dynamic linker to do its job, such as names of libraries to link to, and symbols that need resolving.
Even a program with an empty main() will link with the c-runtime and kernel32.dll libraries (and probably others? - a while since I last did Win32 dev).
You should also be aware that main() is only the entry point of your program - quite a bit has already gone on before this point such as retrieving and tokening the command-line, setting up the locale, creating stderr, stdin, and stdout and setting up the other mechanism required by the c-runtime library such a at_exit(). Similarly, when your main() returns, the runtime does some clean-up - and at the very least needs to call the kernel to tell it that you're done.
As to whether it's necessary? Yes, unless you fancy writing your own program prologue and epilogue each time. There are probably are ways of writing minimal, statically linked applications if you're sufficiently masochistic.
As for storage overhead, why are you getting so worked up? It's not enough to worry about.
There are several initialization functions that load whenever you run a program on Windows. These functions, among other things, call the main() function that you write - which is why you need either a main() or WinMain() function for your program to run. I'm not aware of other included functions though. Do you have some disassembly to show?
You don't have much detail to go on but I think most of what you're seeing is probably the routines of the specific C runtime library that your compiler works with.
For instance there will be code enabling it to run from the entry point 'main' which portable executable format understands to call the main(char ** args) that you wrote in your C program.

How does C code call assembly code (e.g. optimized strlen)?

I always read things about how certain functions within the C programming language are optimized by being written in assembly. Let me apologize if that sentence sounds a little misguided.
So, I'll put it clearly: How is it that when you call some functions like strlen on UNIX/C systems, the actual function you're calling is written in assembly? Can you write assembly right into C programs somehow or is it an external call situation? Is it part of the C standard to be able to do this, or is it an operating system specific thing?
The C standard dictates what each library function must do rather than how it is implemented.
Almost all known implementations of C are compiled into machine language. It is up to the implementers of the C compiler/library how they choose to implement functions like strlen. They could choose to implement it in C and compile it to an object, or they could choose to write it in assembly and assemble it to an object. Or they could implement it some other way. It doesn't matter so long as you get the right effect and result when you call strlen.
Now, as it happens, many C toolsets do allow you to write inline assembly, but that is absolutely not part of the standard. Any such facilties have to be included as extensions to the C standard.
At the end of the road compiled programs and programs in assembly are all machine language, so they can call each other. The way this is done is by having the assembly code use the same calling conventions (way to prepare for a call, prepare parameters and such) as the program written in C. An overview of popular calling conventions for x86 processors can be found here.
Many (most?) C compilers do happen to support inline assembly, though it's not part of the standard. That said, there's no strict need for a compiler to support any such thing.
First, recognize that assembly is mostly just human (semi-)readable machine code, and that C ends up as machine code anyway.
"Calling" a C function just generates a set of instructions that prepare registers, the stack, and/or some other machine-dependent mechanism according to some established calling convention, and then jumps to the start of the called function.
A block of assembly code can conform to the appropriate calling convention, and thus generate a blob of machine code that another blob of machine code that was originally written in C is able to call. The reverse is, of course, also possible.
The details of the calling convention, the assembly process, and the linking process (to link the assembly-generated object file with the C-generated object file) may all vary wildly between platforms, compilers, and linkers. A good assembly tutorial for your platform of choice will probably cover such details.
I happen to like the x86-centric PC Assembly Tutorial, which specifically addresses interfacing assembly and C code.
When C code is compiled by gcc, it's first compiled to assembler instructions, which are then again compiled to a binary, machine-executable file. You can see the generated assembler instructions by specifying -S, as in gcc file.c -S.
Assembler code just passes the first stage of C-to-assembler compilation and is then indistinguishable from code compiled from C.
One way to do it is to use inline assembler. That means you can write assembler code directly into your C code. The specific syntax is compiler-specific. For example, see GCC syntax and MS Visual C++ syntax.
You can write inline assembly in your C code. The syntax for this is highly compiler specific but the asm keyword is ususally used. Look into inline assembly for more information.

removing unneeded code from gcc andd mingw

i noticed that mingw adds alot of code before calling main(), i assumed its for parsing command line parameters since one of those functions is called __getmainargs(), and also lots of strings are added to the final executable, such as mingwm.dll and some error strings (incase the app crashed) says mingw runtime error or something like that.
my question is: is there a way to remove all this stuff? i dont need all these things, i tried tcc (tiny c compiler) it did the job. but not cross platform like gcc (solaris/mac)
any ideas?
thanks.
Yes, you really do need all those things. They're the startup and teardown code for the C environment that your code runs in.
Other than non-hosted environments such as low-level embedded solutions, you'll find pretty much all C environments have something like that. Things like /lib/crt0.o under some UNIX-like operating systems or crt0.obj under Windows.
They are vital to successful running of your code. You can freely omit library functions that you don't use (printf, abs and so on) but the startup code is needed.
Some of the things that it may perform are initialisation of atexit structures, argument parsing, initialisation of structures for the C runtime library, initialisation of C/C++ pre-main values and so forth.
It's highly OS-specific and, if there are things you don't want to do, you'll probably have to get the source code for it and take them out, in essence providing your own cut-down replacement for the object file.
You can safely assume that your toolchain does not include code that is not needed and could safely be left out.
Make sure you compiled without debug information, and run strip on the resulting executable. Anything more intrusive than that requires intimate knowledge of your toolchain, and can result in rather strange behaviour that will be hard to debug - i.e., if you have to ask how it could be done, you shouldn't try to do it.

Resources