Symbols already defined error when compiling assembly output from Visual Studio - c

Currently using Microsoft Visual Studio Community 2019 Version 16.4.4 and compiling a C project.
I want to be able to output assembly from VS, make modifications to the assembly, and then compile the modified assembly into an executable file. Below I talk about my attempts following instructions from a previous SO question, but if there is an alternative (even not using VS) I would appreciate those suggestions.
As a test, I have been using a simple "hello world" program:
#include <stdio.h>
int main(int argc, char* argv) {
printf("Hello world\n");
return 0;
}
I have tried compiling this program following the instructions given by this answer but I get the following errors:
LNK2005 ___local_stdio_printf_options already defined in helloworld.obj
LNK1169 one or more multiply defined symbols found
It looks like VS is actually compiling the assembly to object files, but the linker says that the "___local_stdio_printf_options" symbol is already defined at link-time. When I look back at the assembly code output from VS, this seems to be true:
___local_stdio_printf_options PROC ; COMDAT
; File C:\Program Files (x86)\Windows Kits\10\Include\10.0.18362.0\ucrt\corecrt_stdio_config.h
; Line 86
push ebp
mov ebp, esp
; Line 88
mov eax, OFFSET ?_OptionsStorage#?1??__local_stdio_printf_options##9#9 ; `__local_stdio_printf_options'::`2'::_OptionsStorage
; Line 89
pop ebp
ret 0
___local_stdio_printf_options ENDP
So it looks like the a solution for this may be to prevent the VS compiler from defining ___local_stdio_printf_options or change the linker setting to accept this style of coding somehow. I don't know why VS would produce code that defines external symbols like this. I would really appreciate any help.

Related

How to make Watcom C compiler (wcc) and WASM generate the same 8086 machine code?

For this C source code:
int add(int a, int b) { return a + b; }
, the Watcom C Compiler for 8086 (wcc -s -ms -os -0 prog.c) generates the following machine code (hex): 01 D0 C3, disassembling to add ax, dx (01 D0) + ret (C3).
For this assembly source code:
PUBLIC add_
EXTRN _small_code_:BYTE
_TEXT SEGMENT BYTE PUBLIC USE16 'CODE'
add_: add ax, dx
ret
_TEXT ENDS
END
, the Watcom Assembler (WASM, wasm -ms -0 prog.wasm) generates the following machine code (hex): 03 C2 C3, disassembling to add ax, dx (03 C2) + ret (C3).
Thus they generate a different binary encoding of the same 8086 assembly instruction add ax, dx.
FYI If I implement the the function in Watcom C inline assembly, then the machine code output will be the same as with WASM.
A collection of different instruction encodings:
add ax, dx. wcc: 01 D0; wasm: 03 C2.
mov bx, ax. wcc: 89 C3; wasm: 8B D8.
add ax, byte 9. wcc: 05 09 00; wasm: 83 C0 09.
How can I make the Watcom C compiler (for C code) and WASM generate the instructions with the same binary encoding? Is there a command-line flag or some other configuration option for either? I wasn't able to find any.
The reason why I need it is that I'd like to reproduce an executable program file written in Watcom C by writing WASM source only, and I want the final output be bit-by-bit identical to the original.
This answer is inspired by a comment by #RaymondChen.
Here is a cumbersome, multistep way to change the machine code emitted by wcc to match the output of wasm:
Compile the C source code witm wcc (part of OpenWatcom) to .obj file as usual.
Use dmpobj (part of OpenWatcom) to extract the machine code bytes of the _TEXT segment.
Use ndisasm (part of NASM, ndisasm -b 16 file.obj) to disassemble the machine code bytes.
Write and run custom source text filter to keep the assembly instructions only and convert them WASM syntax.
Use wasm (part of OpenWatcom) to generate the 2nd .obj file.
Use dmpobj to extract the machine code bytes of the _TEXT segment of the 2nd .obj file.
Write and run a custom binary filter to replace the machine code bytes in the _TEXT segment of the 1st .obj file from the equivalent bytes extracted from the 2nd .obj file, using the offsets in the outputs of the dmpobj invocations.
These steps avoid using wdis -a (conversion from .obj to assembly source), because that's lossy (it doesn't include everything in the .obj file), which can potentially make unwanted changes, causing problems later.

Is there a way to call 6502 assembly code from C file?

I am using cc65 6502 simulator, which compiles code for 6502. I wish to link the 6502 code and C code and produce a binary file that I can execute.
My C code "main.c":
#include<stdio.h>
extern void foo(void);
int main() {
foo();
return 0;
}
My 6502 code "foo.s":
foo:
LDA #$00
STA $0200
The code might seem very simple but I am just trying to achieve the successful linking. But I cannot get rid of the following error:
Unresolved external '_foo' referenced in:
main.s(27)
ld65: Error: 1 unresolved external(s) found - cannot create output file
You need to export it from the assembly module - with the same decoration the C compiler uses:
_foo:
.export _foo
LDA #$00
STA $0200
This links with:
cl65 -t sim6502 main.c foo.s -o foo
You might also need to look into the calling conventions.

GDB doesn't recognize some C functions

So I'm new to Linux and just got Ubuntu 16.04.2 running on a VM. I've installed gcc/g++ on here in the terminal, but when I run my program in GDB, as soon as I get to a strcmp function, this pops up for many lines.
strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:24
24 ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: No such file or directory.
And when I go further down:
strlen () at ../sysdeps/x86_64/strlen.S:66
66 ../sysdeps/x86_64/strlen.S: No such file or directory.
So I'm guessing it just doesn't recognize my C library..
I realize I can step through this after a couple of tries, but this comes up for all my c functions and when I use GDB on my school server, I don't run into this issue. Any help would be appreciated.
I get to a strcmp function, this pops up for many lines.
When you does s (single step) or si (Step single instruction), what you see for string and memory functions like strcmp, memcpy, memcmp, strlen etc is correct, and GDB does recognize your C library (Ubuntu 16.04.2 amd64 started from iso in VM already has libc6-dbg debugging package preinstalled for your libc - C library).
strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:24
24 ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S: No such file or directory.
strlen () at ../sysdeps/x86_64/strlen.S:66
66 ../sysdeps/x86_64/strlen.S: No such file or directory.
What we see here is that GDB was able to find debugging information for both functions strcmp and strlen to get line numbers, but these functions of standard C library are not C functions! They are assembler functions (one is optimiezed with SSE2), we can see this from .S suffix of their source reference. You can try to do several s or si after entering to them to see incrementing source file lines.
it just doesn't recognize
GDB did all what it can do: it finds debugging info for your system C library (it is not easiest as debug info is separated to other file somewhere in /usr/lib/debug/lib/x86_64-linux-gnu/ with other name), and finds which instruction comes from which line of source. What it can't do is to open source file, as it is not part of preinstalled ubuntu image not part of any ubuntu (debian) binary package.
What can you do if you want to look inside this system library function:
1) Check disassembly of the function with GDB command disassemble (by default it will print current function). It will be very close to the source of this function implementation as it was originally written in assembler and what you lose are comments and structure of macro:
Dump of assembler code for function strlen:
0x000address70 <+0>: pxor %xmm0, %xmm0
=> 0x000address74 <+0>: pxor %xmm1, %xmm1
0x000address78 <+0>: pxor %xmm2, %xmm2
0x000address7c <+0>: pxor %xmm3, %xmm3
...
2) Or you can see instructions as they are executed with "display" command like display/i $pc or disp/2i $pc (print one instruction at current PC which is universal just name of EIP or RIP; or print two instructions: current and next)
3) Or you can create the path required by gdb and copy original source to it: mkdir -p ../sysdeps/x86_64/ and save to this directory assembler source for your version of library. There is glibc-2.23 version for strlen.S (github mirror of authors GIT): https://github.com/bminor/glibc/blob/glibc-2.23/sysdeps/x86_64/strlen.S#L66
4) Or you can download ubuntu source for libc with apt source libc (in some stable path like ~/src after mkdir ~/src) and point gdb to this directory (adding some real subdirectory accounting to ../ relative part of libc build in ubuntu) with directory ~/src/glibc-2.23/sysdeps)
this comes up for all my c functions
No, for your c functions you have other kind of output (not ... something.S: No such file or directory). And you should enable debugging symbols when you built your program by adding -g argument to gcc (or other compiler).

What are .LFB .LBB .LBE .LVL .loc in the compiler generated assembly code

When I look into the assembly code generated by GCC, there are many lines begining with .LBB and a number. It seems to that they are not instructions of operations. More like marking something of the file or what.
What are .LFB, .LVL, LBB, LBE etc are in the compiler generated assembly code?
Does the .loc means "line of code". Do those lines just indicate symbol table?
Here is a piece of code,
main:
.LFB1:
.loc 1 8 0
.cfi_startproc
.LVL2:
.LBB4:
.LBB5:
.loc 1 2 0
movsd b(%rip), %xmm0
.LBE5:
.LBE4:
.loc 1 10 0
xorl %eax, %eax
.LBB7:
.LBB6:
.loc 1 2 0
mulsd a(%rip), %xmm0
.LBE6:
.LBE7:
.loc 1 9 0
movsd %xmm0, a(%rip)
.LVL3:
.loc 1 10 0
ret
.cfi_endproc
.loc
As mentioned by Ferruccio .loc is a debugging directive, and it only appears in GCC 4.8.2 if you tell the compiler to generate debugging information with -ggdb.
.loc is documented at https://sourceware.org/binutils/docs-2.18/as/LNS-directives.html#LNS-directives and the exact output depends on the debug data format (DWARF2, etc.).
The other are labels.
.L prefix
GCC uses the .L for local labels.
GAS will not generate any symbols on the compiled output by default as documented at: https://sourceware.org/binutils/docs-2.18/as/Symbol-Names.html
A local symbol is any symbol beginning with certain local label prefixes. By default, the local label prefix is `.L' for ELF systems
Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging. You may use the `-L' option (see Include Local Symbols: -L) to retain the local symbols in the object files.
So if you compile with: as -c a.S, nm a.o does not show those labels at all.
This only makes sense because you cannot generate such labels from a C program.
There are also options that manage it like:
man as: --keep-locals
man ld: --discard-all
This seems to be a GCC toolchain specific convention, not part an ELF ABI nor NASM.
Furthermore, both NASM and GAS use the convention that labels that start with a period (except .L in GAS) generate local symbols: http://www.nasm.us/doc/nasmdoc3.html#section-3.9 which are still present on the output but not used across object files.
Suffixes
The suffixes you mention all appear to be debugging related, as they are all defined under gcc/dwarf2out.c on GCC 4.8.2 and DWARF2 is a major debugging information format for ELF:
#define FUNC_BEGIN_LABEL "LFB"
#define FUNC_END_LABEL "LFE"
#define BLOCK_BEGIN_LABEL "LBB"
#define BLOCK_END_LABEL "LBE"
ASM_GENERATE_INTERNAL_LABEL (loclabel, "LVL", loclabel_num);
From my experiments, some of them are generated only with gcc -g, others even without g.
Once we have those define names, it is easy to generate C code that generates them to see what they mean:
LFB and LFE are generated at the beginning and end of functions
LBB and LBE were generated by the following code with gcc -g on internal function block scopes:
#include <stdio.h>
int main() {
int i = 0;
{
int i = 1;
printf("%d\n", i);
}
return 0;
}
LVL: TODO I was not able to easily understand it. We'd need to interpret the source some more.
The .loc directive is used to indicate the corresponding line of source code.
It indicates the file number, line number and column number of the corresponding source code.
The rest look like labels.

Fatal error when using FILE* in Windows from DLL

Recently, I found a problem with Visual C++ 2008 compiler, but using minor hack avoid it. Currently, I cannot use the same hack, but problem exists as in 2008 as in 2010 (Express).
So, I've prepared for you 2 simple C file: one for DLL, one for program:
DLL (file-dll.c):
#include <stdio.h>
__declspec(dllexport) void
print_to_stream (FILE *stream)
{
fprintf (stream, "OK!\n");
}
And for program, which links this DLL via file-dll.lib:
Program:
#include <stdio.h>
__declspec(dllimport) void print_to_stream (FILE *stream);
int
main (void)
{
print_to_stream (stdout);
return 0;
}
To compile and link DLL:
cl /LD file-dll.c
To compile and link program:
cl file-test.c file-dll.lib
When invoking file-test.exe, I got the fatal error (similar to segmentation fault in UNIX).
As I said early, I had that the same problem before: about transferring FILE* pointer to DLL. I thought, that it may be because of compiler mismatch, but now I'm using one compiler for everything and it's not the problem. ;-(
What can I do now?
UPD:
I've found solution:
cl /LD /MD file-dll.c
cl /MD file-test.c file-dll.lib
The key is to link to dynamic library, but (I did not know it) by default it links staticaly and (hencefore) error occurs (I see why).
P.S. Thanks for patience.
Potential Errors Passing CRT Objects Across DLL Boundaries
There is a specific example for your situation in here. Depending on how you compile your DLL and program, you might have separate copies of the CRT which will result in an access violation.

Resources