Delphi dcu to obj - c

Is there a way to convert a Delphi .dcu file to an .obj file so that it can be linked using a compiler like GCC? I've not used Delphi for a couple of years but would like to use if for a project again if this is possible.

Delphi can output .obj files, but they are in a 32-bit variant of Intel OMF. GCC, on the other hand, works with ELF (Linux, most Unixes), COFF (on Windows) or Mach-O (Mac).
But that alone is not enough. It's hard to write much code without using the runtime library, and the implementation of the runtime library will be dependent on low-level details of the compiler and linker architecture, for things like correct order of initialization.
Moreover, there's more to compatibility than just the object file format; code on Linux, in particular, needs to be position-independent, which means it can't use absolute values to reference global symbols, but rather must index all its global data from a register or relative to the instruction pointer, so that the code can be relocated in memory without rewriting references.
DCU files are a serialization of the Delphi symbol tables and code generated for each proc, and are thus highly dependent on the implementation details of the compiler, which changes from one version to the next.
All this is to say that it's unlikely that you'd be able to get much Delphi (dcc32) code linking into a GNU environment, unless you restricted yourself to the absolute minimum of non-managed data types (no strings, no interfaces) and procedural code (no classes, no initialization section, no data that needs initialization, etc.)

(answer to various FPC remarks, but I need more room)
For a good understanding, you have to know that a delphi .dcu translates to two differernt FPC files, .ppu file with the mentioned symtable stuff, which includes non linkable code like inline functions and generic definitions and a .o which is mingw compatible (COFF) on Windows. Cygwin is mingw compatible too on linking level (but runtime is different and scary). Anyway, mingw32/64 is our reference gcc on Windows.
The PPU has a similar version problem as Delphi's DCU, probably for the same reasons. The ppu format is different nearly every major release. (so 2.0, 2.2, 2.4), and changes typically 2-3 times an year in the trunk
So while FPC on Windows uses own assemblers and linkers, the .o's it generates are still compatible with mingw32 In general FPC's output is very gcc compatible, and it is often possible to link in gcc static libs directly, allowing e.g. mysql and postgres linklibs to be linked into apps with a suitable license. (like e.g. GPL) On 64-bit they should be compatible too, but this is probably less tested than win32.
The textmode IDE even links in the entire GDB debugger in library form. GDB is one of the main reasons for gcc compatibility on Windows.
While Barry's points about the runtime in general hold for FPC too, it might be slightly easier to work around this. It might only require calling certain functions to initialize the FPC rtl from your startup code, and similarly for the finalize. Compile a minimal FPC program with -al and see the resulting assembler (in the .s file, most notably initializeunits and finalizeunits) Moreover the RTL is more flexible and probably more easily cut down to a minimum.
Of course as soon as you also require exceptions to work across gcc<->fpc bounderies you are out of luck. FPC does not use SEH, or any scheme compatible with anything else ATM. (contrary to Delphi, which uses SEH, which at least in theory should give you an advantage there, Barry?) OTOH, gcc might use its own libunwind instead of SEH.
Note that the default calling convention of FPC on x86 is Delphi compatible register, so you might need to insert proper cdecl (which should be gcc compatible) modifiers, or even can set it for entire units at a time using {$calling cdecl}
On *nix this is bog standard (e.g. apache modules), I don't know many people that do this on win32 though.
About compatibility; FPC can compile packages like Indy, Teechart, Zeos, ICS, Synapse, VST
and reams more with little or no mods. The dialect levels of released versions are a mix of D7 and up, with the focus on D7. The dialect level is slowly creeping to D2006 level in trunk versions. (with for in, class abstract etc)

Yes. Have a look at the Project Options dialog box:
(High-Res)

As far as I am aware, Delphi only supports the OMF object file format. You may want to try an object format converter such as Agner Fog's.

Since the DCU format is proprietary and has a tendency of changing from one version of Delphi to the next, there's probably no reliable way to convert a DCU to an OBJ. Your best bet is to build them in OBJ format in the first place, as per Andreas's answer.

Related

How do I build newlib for size optimization?

I'm building an arm-eabi-gcc toolchain with Newlib 2.5.0 as the target C library.
The target embedded system would prefer smaller code size over execution speed. How do I configure newlib to favour smaller code size?
The default build does things like produce a version of strstr that is over 1KB in code size.
There is fat in Newlib that can be addressed with Newlib-nano, which is already part of GCC ARM Embedded, as discussed here (Note the article is from 2014, so the information may be out-dated, but there appears to be Newlib-nano support in the current v6-2017 too).
It removes some features added after C89 that are rarely used in MCU based embedded systems, simplifies complex functions such as formatted I/O, and removes wide character support from non-wide character specific functions. Critically in respect to this question the default build is already size optimised (-Os).
Configure newlib like this:
CFLAGS_FOR_TARGET="-DPREFER_SIZE_OVER_SPEED=1 -Os" \
../newlib-2.5.0/configure
(where I've omitted the rest of the arguments I used for configure, they don't change based on this issue).
There isn't a configure flag, but the configure script reads certain variables from the environment. CFLAGS_FOR_TARGET means flags used when building for the target system.
Not to be confused with CFLAGS_FOR_BUILD , which are flags that would be used if the build system needed to make any auxiliary executables to execute on the build system to help with the build process.
I couldn't find any official documentation on this, but searching the source code, it contained many instances of testing for PREFER_SIZE_OVER_SPEED or __OPTIMIZE_SIZE__. Based on a quick grep, these two flags are almost identical. The only difference was a case in the printf family that if a null pointer is passed for %s, then the former will translate it to (null) but the latter bulls on ahead , probably causing a crash.

Dynamically change the running code by writing into the __FILE__?

I got to know of a way to print the source code of a running code in C using the __FILE__ macro. As such I can seek the location and use putchar() to alter the contents of the file.
Is it possible to dynamically change the running code using this method?
Is it possible to dynamically change the running code using this method ?
No, because once a program is compiled it no longer depends on the source file.
If you want learn how to alter the behavior of an process that is already running from within the process itself, you need to learn about assembly for the architecture you're using, the executable file format on your system, and the process API on your system, at the very least.
As most other answers are explaining, in practical terms, most C implementations are compilers. So the executable that is running has only an indirect (and delayed) relation with the source code, because the source code had to be processed by the compiler to produce that executable.
Remember that a programming language is (not a software but...) a specification, written in some report. Read n1570, draft specification of C11. Most implementations of C are command-line compilers (e.g. GCC & Clang/LLVM in the free software realm), even if you might find interpreters.
However, with some operating systems (notably POSIX ones, such as MacOSX and Linux), you could dynamically load some plugin. Or you could create, in some other way (such as JIT compilation libraries like libgccjit or LLVM or libjit or GNU lightning), a fresh function and dynamically get a pointer to it (and that is not stricto sensu conforming to the C standard, where a function pointer should point to some existing function of your program).
On Linux, you might generate (at runtime of your own program, linked with -rdynamic to have its names usable from plugins, and with -ldl library to get the dynamic loader) some C code in some temporary source file e.g. /tmp/gencode.c, run a compilation (using e.g. system(3) or popen(3)) of that emitted code as a /tmp/gencode.so plugin thru a command like e.g. gcc -O1 -g -Wall -fPIC -shared /tmp/gencode.c -o /tmp/gencode.so, then dynamically load that plugin using dlopen(3), find function pointers (from some conventional name) in that loaded plugin with dlsym(3), and call indirectly that function pointer. My manydl.c program shows that is possible for many hundred thousands of generated C files and loaded plugins. I'm using similar tricks in my GCC MELT. See also this and that. Notice that you don't really "self-modify" C code, you more broadly generate additional C code, compile it (as some plugin, etc...), and then load it -as an extension or plugin- then use it.
(for pragmatical reasons including ease of debugging, I don't recommend overwriting some existing C file, but just emitting new C code in some fresh temporary .c file -from some internal AST-like representation- that you would later feed to the compiler)
Is it possible to dynamically change the running code?
In general (at least on Linux and most POSIX systems), the machine code sits in a read-only code segment of the virtual address space so you cannot change or overwrite it; but you can use indirection thru function pointers (in your C code) to call newly loaded code (e.g. from dlopen-ed plugins).
However, you might also read about homoiconic languages, metaprogramming, multi-staged programming, and try to use Common Lisp (e.g. using its SBCL implementation, which compile to machine code at every REPL interaction and at every eval). I also recommend reading SICP (an excellent and freely available introduction to programming, with some chapters related to metaprogramming approaches)
PS. Dynamic loading of plugins is also possible in Windows -which I don't know- with LoadLibrary, but with a very different (and incompatible) model. Read Levine's linkers and loaders.
A computer doesn't understand the code as we do. It compiles or interprets it and loads into memory. Our modification of code is just changing the file. One needs to compile it and link it with other libraries and load it into memory.
ptrace() is a syscall used to inject code into a running program. You can probably look into that and achieve whatever you are trying to do.
Inject hello world in a running program. I have tried and tested this sometime before.

gcc generating a list of function and variable names

I am looking for a way to get a list of all the function and variable names for a set of c source files. I know that gcc breaks down those elements when compiling and linking so is there a way to piggyback that process? Or any other tool that could do the same thing?
EDIT: It's mostly because I am curious, I have been playing with things like make auto dependency and graphing include trees and would like to be able to get more stats on the source files. And it seems like something that would already exist but i haven't found any options or flags for it.
If you are only interested by names of global functions and variables, you might (assuming you are on Linux) use the nm or objdump utilities on the ELF binary executable or object files.
Otherwise, you might customize the GCC compiler (assuming you have a recent version, e.g. 5.3 or 6 at least) thru plugins. You could code them directly in C++, or you might consider using GCC MELT, a Lisp-like domain specific language to customize GCC. Perhaps even the findgimple mode of GCC MELT might be enough....
If you consider extending GCC, be aware that you'll need to spend a significant time (perhaps months) understanding its internal representations (notably Generic Trees & Gimple) in details. The links and slides on GCC MELT documentation page might be useful.
Your main issue is that you probably need to understand most of the details about GCC internal representations, and that takes time!
Also, the details of GCC internals are slightly changing from one version of GCC to the next one.
You could also consider (instead of working inside GCC) using the Clang/LLVM framework (but learning that is also a lot of time). Maybe you might also look into Frama-C or Coccinnelle.
Another approach might be to compile with debug info and parse DWARF information.
PS. My point is that your problem is probably much more difficult than what you believe. Parsing C is not that simple ... You might spend months or even years working on that... And details could be target-processor & system & compiler specific...

Can I write (x86) assembly language which will build with both GCC and MSVC?

I have a project which is entirely written in C. The same C files can be compiled using either GCC for Linux or MSVC for Windows. For performance reasons, I need to re-write some of the code as x86 assembly language.
Is it possible to write this assembly language as a source file which will build with both the GCC and MSVC toolchains? Alternatively, if I write an assembly source file for one toolchain, is there a tool to convert it to work with the other?
Or, am I stuck either maintaining two copies of the assembly source code, or using a third-party assembler such as NASM?
I see two problems:
masm and gas have different syntax. gas can be configured to use Intel syntax with the .syntax intel,noprefix directive, but even then small differences remain (such as, different directives). A possible approach is to preprocess your assembly source with the C preprocessor, using macros for all directives that differ between the two. This also has the advantage of providing a unified comment syntax.
However, just using a portable third party assembler like nasm is likely to be less of a hassle.
Linux and Windows have different calling conventions. A possible solution for x86-32 is to stick to a well-supported calling convention like stdcall. You can tell gcc what calling convention to use when calling a function using function attributes. For example, this would declare foo to use the stdcall calling convention:
extern int foo(int x, int y) __attribute__((stdcall));
You can do the same thing in MSVC with __declspec, solving this issue.
On x86-64, a similar solution is likely possible, but I'm not exactly sure what attributes you have to set.
You can of course also use the same cpp-approach as for the first problem to generate slightly different function prologues and epilogues depending on what calling convention you need. However, this might be less maintainable.

Linking object files from different C compilers

Say I have two compilers, or even a single compiler with two different option sets. Each compiler compiles some C code into an object and I try to link the two .o files with a common linker. Will this succeed?
My initial thought is: not always. If the compilers are using the same object file format and have compatible options, then it would succeed. But, if the compilers have conflicting options, or (and this is an easy one) are using two different object file formats, it would not work correctly.
Does anyone have more insight on this? What standards would the object files need to comply with to gain confidence that this will work?
Most flavors of *nix OSes have well defined and open ABI and mostly use ELF object file format, so it is not a problem at all for *nix.
Windows is less strictly defined and different compilers may vary in some calling conventions (for example __fastcall may not be supported by some compilers or may have different behavior, see https://en.wikipedia.org/wiki/X86_calling_conventions). But main set of calling conventions (__stdcall, _cdecl, etc) is standard enough to ensure successfull call of function compiled by one compiler from another compiler, otherwise the program won't work at all, since unlike Linux every system call in Windows is wrapped by function from DLL which you need to successfully call.
The other problem is that there is no standard common format for object files. Although most tools (MS, Intel, GCC (MinGW), Clang) use COFF format, some may use OMF (Watcom) or ELF (TinyC).
Another problem is so called "name mangling". Although it was introduced to support overloading C++ functions with the same name, it was adopted by C compilers to prevent linkage of functions defined with different calling conventions. For example, function int _cdecl fun(void); will get compiled name _fun whilst int __stdcall fun(void); will get name _fun#0. More information on name mangling see here: https://en.wikipedia.org/wiki/Name_mangling.
At last, default behavior may differ for some compilers, so yes, options may prevent successful linking of object files produced by different compilers or even by the same compiler. For example, TinyC use default convention _cdecl, whilst CLang use __stdcall. TinyC with default options may not produce code that may be linked with other because it doesn't prepend name by underscore sign. To make it cross-linkable it needs -fleading-underscore option.
But keeping in mind all said above the code may successfully be intermixed. For example, I successfully linked together code produced by Visual Studio, Intel Parallel Studio, GCC (MinGW), Clang, TinyC, NASM.

Resources