I am trying to figure out the differences between ELF executables linked for QNX and Linux, respectively.
Is the exact ELF executable format needed by QNX documented somewhere? Minimally, what sections, segments, entries, etc. do I need in an ELF file to get the QNX loader to execute the instruction at the specified entry point?
Related
Is there any way by which we can identify that a .obj file and .exe file is 16/32 bit?
Basically I want to create a smart linker, that will automatically identify which linker do the given file names need to be passed to.
Preferred Language: C (it can be different, if needed)
I am looking for some solution that can read the bytes of an .exe/the code of an .obj file and then determine if it's 16/32 bit. Even an algorithm would too do.
Note: I know both object code and a executable are two different entities.
All of this information is encoded in the binary object according to the relevant Application Binary Interface (ABI).
The current Linux ABI is the Executable and Linkable Format (ELF), and you can query a specific binary file using a tool such as readelf or objdump.
The current Windows ABI is the Portable Executable (PE) format. I'm not familiar with the toolset here but a quick google search suggests there are programs that function the same as readelf:
http://www.pe-explorer.com/peexplorer-tour.htm
Here's the Microsoft specification of the PE format:
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format
However, neither of those formats support 16-bit binaries anymore. The older ABI format is called "a.out" for Linux, which can be read and queried with objdump (I'm not sure about readelf). The older Windows/DOS formats are called MZ and NE. Again, I'm not familiar with the tool support for these older Windows formats.
Wikipedia has a pretty comprehensive list of all the popular executable file formats that have been used, with links to more info:
https://en.wikipedia.org/wiki/Comparison_of_executable_file_formats
I have a x86_64 machine, and it can run IA32 process because I have installed a 32bits library. Now I want to know what's the platform that a running process is using? 64bits or 32bits?
The only way I can access the process is a ptrace system call; I don't have executable file (like I can just execute the file but I don't have read and write permissions), so I can't get the ELF header.
The OS I'm using is Ubuntu 14.04 LTS.
I don't want to get executable file, and then analyse the ELF format. The ONLY WAY I can access the process is ptrace, or other system calls same as ptrace if you know, please tell me. Because I want to analyse the process in a C program.
This was also asked on https://unix.stackexchange.com/questions/106234/determine-if-a-specific-process-is-32-or-64-bit, with limited success / viability for detection methods other than checking ELF headers after getting to them in various ways.
Looking at /proc/<pid>/maps for 64-bit addresses looks viable. So does checking the bitness of /proc/<pid>/exe:
$ file - < /proc/$(pidof a.out)/exe
/dev/stdin: ELF 32-bit LSB pie executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=ff6b5084918be4e4daf7e4315fa4d6dd6a039ae7, for GNU/Linux 4.4.0, with debug_info, not stripped
(Or file -L to follow symlinks. If you just use file /.../exe, it tells you symbolic link to /tmp/a.out.)
Note that files in /proc track the actual inode, so replacing /tmp/a.out with a different executable does not throw this off. Opening it for reading will open the actual executable that this process has mapped, separate from the name it would report via a readlink() system call. If that inode has no directory entries anymore, file will report symbolic link to /tmp/a.out (deleted), but opening for reading will still get the contents. And renaming a.out will get the kernel to report the new name as the symlink, like /tmp/bar.
This answer previously suggested looking at /proc/<pid>/personality, but the difference I was seeing was that a 32-bit process had the READ_IMPLIES_EXEC bit set. That's likely because I built a 32-bit executable from asm sources I had been playing with at the time, without the .note.GNU-stack,"",#progbits that overrides the default of executable stacks, previously implemented by making all pages executable: Unexpected exec permission from mmap when assembly files included in the project
A 32-bit executable compiled by gcc -m32 has personality 00000000, same as 64-bit /bin/sleep. So this isn't a useful detection mechanism, unfortunately. I was hoping that 32-bit processes would have some bits set like ADDR_LIMIT_32BIT, but apparently that's implicit for a 32-bit process, perhaps as part of the "execution domain" like PER_LINUX32.
I got 00400000 (just READ_IMPLIES_EXEC) for a 32-bit process with executable stacks (and everything else). (And 00440000 when I had it stopped in a debugger.) The proc(5) man page says it tells you the personality as set by personality(2).
Kernel source for personality bit-numbers: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/personality.h
glibc's copy: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/sys/personality.h;hb=HEAD
Mostly leaving this part of the answer here in case the part about decoding personalities is useful to future readers. It's not relevant to finding the bitness of a running process.
Try using xocopy to create a copy of the executable in question and dump the ELF header from the copy created.
This tool and the problem you are describing was discussed elsewhere and that discussion may be helpful to you as well.
I am working on a project that involves ELF binary file parsing. From past few weeks I am reading quite a bit on ELF format.
However, one thing I really want to understand is how linkers and loaders use the different sections in an ELF file. Can someone please suggest me some resources which can teach me about the same. Appreciate your help.
A detailed description of how an ELF linker works by the author of GNU gold can be found in a series of blog posts starting here.
(Runtime) loaders do not use any sections of the ELF files (a valid ET_DYN or ET_EXEC ELF file can have all section headers stripped). They only use segments. I don't know of any good description of an ELF loader, but here is a possible start.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
dlopen from memory?
I've seen this for Windows' DLL files, being loaded from a memory buffer, but I cant find it anywhere for Linux, and "ld" source code is the most complex code I've ever seen. So:
Is there any example of loading .so files from memory? Even a simple one that I can finish? I just don't know where to start, even though I've read most of the ELF specifications it's still mysterious to me.
You're looking at the source code of a wrong thing: ld doesn't do program and library loading. Instead, you should look at the source code of dlopen and dlsym functions found in libc. Also, you should look at the source of the dynamic linker: ld-linux.so (the true name varies with the platform; execute ldd /bin/ls to find out where the dynamic linker resides).
ELF parsing isn't difficult, but it requires attention to detail and understanding of assembly code for the particular CPU; you need also ABI specification for your platform (and it's different for 32- and 64-bit linux, and is also different between CPUs.)
If you just need to load object files from memory at run-time (i.e., it doesn't have to be a SO), you can look at X11 project: they have implemented a module system which, basically, loads object code at some address and relocates it.
You need dlopen() family of functions (on GNU/Linux, they are defined in /usr/include/dlfcn.h).
For an example, take a look at how PHP does modules.
What does "loading .so files from memory" means to you?
If you have any *.so file, then it is in some file system, and has a path. Then just use dlopen on it.
If it is not a file, what is it? How did you get in memory? What exactly have you in memory? (Do you have an ELF header and ELF layout in memory?)
If you have enough information to make an ELF *.so file, dump (i.e. write) such file into some file system (use a temporary filesystem like tmpfs if you are concerned with disk performance). Then dlopen that.
If you don't have enough information to make an ELF .so file, then probably you are dynamically building code in memory. Look at what existing machine code generating infrastructure (like LLVM, GCCJIT, libjit, GNU lightning, LuaJit ....) are doing.
If you have a full functional code in memory, ensure that the memory is executable with mmap & mprotect and jump into it (e.g. using function pointer tricks).
If binary file size is not an issue, are there any drawbacks using -g and not strip binaries that are to be run in a performance critical environment? I have a lot of disk space but the binary is cpu intensive and uses a lot of memory. The binary is loaded once and is alive for several hours.
EDIT:
The reason why I want to use binaries with debugging information is to generate useful core dumps in case of segmentation faults.
The ELF loader loads segments, not sections; the mapping from sections to segments is determined by the linker script used for building the executable.
The default linker script does not map debug sections to any segment, so this is omitted.
Symbol information comes in two flavours: static symbols are processed out-of-band and never stored as section data; dynamic symbol tables are generated by the linker and added to a special segment that is loaded along with the executable, as it needs to be accessible to the dynamic linker. The strip command only removes the static symbols, which are never referenced in a segment anyway.
So, you can use full debug information through the entire process, and this will not affect the size of the executable image in RAM, as it is not loaded. This also means that the information is not included in core dumps, so this does not give you any benefit here either.
The objcopy utility has a special option to copy only the debug information, so you can generate a second ELF file containing this information and use stripped binaries; when analyzing the core dump, you can then load both files into the debugger:
objcopy --only-keep-debug myprogram myprogram.debug
strip myprogram