Recently, I have been playing around ELF format files. And I tried to solve a problem:
Given the eip, print the name of the function in the ELF executable file
And I can do this with symbol table and string table. Since I only need to deal with those symbols whose type is STT_FUNC, I wrote the following program:
for (i = 0; i < nr_symtab_entry; ++i) {
if ((symtab[i].st_info == STT_FUNC) &&
eip < symtab[i].st_value + symtab[i].st_size &&
eip >= symtab[i].st_value) {
strcpy(funcName, strtab + symtab[i].st_name);
}
}
where symtab is the symbol table, strtab is the string table.
But after several tests, I realized that the program above is wrong. After several trials, I changed it into this:
for (i = 0; i < nr_symtab_entry; ++i) {
if ((symtab[i].st_info & STT_FUNC) &&
eip < symtab[i].st_value + symtab[i].st_size &&
eip >= symtab[i].st_value) {
strcpy(funcName, strtab + symtab[i].st_name);
}
}
Then it worked! But when I man elf, the manual told me:
st_info This member specifies the symbol’s type and binding attributes
It didn't mention whether it is a bit flag or not. And then I encountered a problem which needs me to check whether a segment is PT_LOAD. And the manual, again, does not specify whether it is a bit flag or not. So I come here to ask for help---Is PT_LOAD also a bit flag? Is every symbol-constant like thing in ELF file a bit flag?
It seems that st_info can be interpreted by specific macros. But how about p_type?
Use:
if (ELF32_ST_TYPE(symtab[i].st_info) == STT_FUNC && ...
like for example kernel does in linux kernel/module.c.
The ELF32_ST_TYPE is used to extract type of a symbol from the st_info. I can't find the list of which symbols are types of a symbol anywhere, but inspecting #define ELF32_ST_TYPE(info) ((info) & 0xf) and definitions in elf.h I can be pretty sure ELF32_ST_TYPE(st_info) is equal to one of the following macros:
#define STT_NOTYPE 0
#define STT_OBJECT 1
#define STT_FUNC 2
#define STT_SECTION 3
#define STT_FILE 4
#define STT_COMMON 5
#define STT_TLS 6
In man elf there it is:
There are macros for packing and unpacking the binding and
type fields:
ELF32_ST_BIND(info), ELF64_ST_BIND(info)
Extract a binding from an st_info value.
ELF32_ST_TYPE(info), ELF64_ST_TYPE(info)
Extract a type from an st_info value.
ELF32_ST_INFO(bind, type), ELF64_ST_INFO(bind, type)
Convert a binding and a type into an st_info value.
Related
/* UART HEADER */
#define featureA 0xA0
#define featureB 0xB0
#define featureC 0x20
-
-
-
- // increment on feature, value of feature are random
-
-
#define featureZ 0x??
#define CHECK_CHAR(x) ((x==featureA)||(x==featureB)||(x==featureC) ------- (x==featureZ)? TRUE: FALSE)
Hi all, I got a set of UART header that to indicate what the commands for. So every time I am checking the header using the macro, but I realize that when the command is keep increasing and the macro length also keep increasing, it make the code very messy. I am looking for a mechanism to handle this checking when the feature is more and more.
Since this seems to be a run-time check, you can speed up the program considerably by using a look-up table instead. It will also make the code more readable. Assuming you can spare 256 bytes of flash and all codes are unique, then:
bool CHECK_CHAR (uint8_t ch)
{
const bool LOOKUP [256] =
{
[featureA] = true,
[featureB] = true,
[featureC] = true,
};
return LOOKUP[ch];
}
The second best option is a sorted array of uint8_t constants + binary search.
Assuming that your values follow the simple pattern above, this would work:
#define CHECK_CHAR(x) ( ((x)&0xf) == 0 && (x) >= featureA && (x) < featureInvalid )
where you define featureInvalid to be the next logical value after featureZ.
I am curious about DT_USED entry in .dynamic section. However, I could only find two code examples that describe this entry.
1.
#define DT_USED 0x7ffffffe /* ignored - same as needed */
in https://github.com/switchbrew/switch-tools/blob/master/src/elf_common.h
2.
case DT_USED:
case DT_INIT_ARRAY:
case DT_FINI_ARRAY:
if (do_dynamic)
{
if (entry->d_tag == DT_USED
&& VALID_DYNAMIC_NAME (entry->d_un.d_val))
{
char *name = GET_DYNAMIC_NAME (entry->d_un.d_val);
if (*name)
{
printf (_("Not needed object: [%s]\n"), name);
break;
}
}
print_vma (entry->d_un.d_val, PREFIX_HEX);
putchar ('\n');
}
break;
in http://web.mit.edu/freebsd/head/contrib/binutils/binutils/readelf.c
I want to know, what's the meaning of "Not needed object"? Does it mean that file names listed here are not needed?
In general, when looking at Solaris dynamic linker features, it is possible to find more information in the public Illumos sources (which were once derived from OpenSolaris). In this case, it seems that DT_USED is always treated like DT_NEEDED, so they are the essentially same thing. One of the header files, usr/src/uts/common/sys/link.h also contains this:
/*
* DT_* entries between DT_HIPROC and DT_LOPROC are reserved for processor
* specific semantics.
*
* DT_* encoding rules apply to all tag values larger than DT_LOPROC.
*/
#define DT_LOPROC 0x70000000 /* processor specific range */
#define DT_AUXILIARY 0x7ffffffd /* shared library auxiliary name */
#define DT_USED 0x7ffffffe /* ignored - same as needed */
#define DT_FILTER 0x7fffffff /* shared library filter name */
#define DT_HIPROC 0x7fffffff
There may have been planned something here, but it doesn't seem to be implemented (or it used to be and no longer is).
I would like to compute something according to the version of a library (which I can't change the values) by using C language.
However, the version of the library, that I am using, is defined as string by using #defines like:
/* major version */
#define MAJOR_VERSION "2"
/* minor version */
#define MINOR_VERSION "2"
Then, my question is: how to do define the macro STR_TO_INT in order to convert the strings MINOR_VERSION and MAJOR_VERSION to integer?
#if ((STR_TO_INT(MAJOR_VERSION) == 2 && STR_TO_INT(MINOR_VERSION) >= 2) || (STR_TO_INT(MAJOR_VERSION > 2))
//I perform an action...
#else
//I perform a different action
#endif
I prefer to define it as macro since I am using a lot of function from this library. Please feel free to give me any idea.
Preprocess the official library header, libheader.h, to generate your more useful information without the quotes in a new header, libversion.h:
sed -n -e '/^#define \(M[AI][JN]OR\)_VERSION "\([0-9][0-9]*\)".*/ {
s//#define NUM_\1_VERSION \2/p
}' libheader.h >libversion.h
You might need to be more flexible about allowing spaces and tabs around the separate parts of #, define and the macro name. I also assume there are no comments in the definition (trailing comments are handled):
/* This starts in column 1 - unlike the next line */
# define /* No comment here */ MAJOR_VERSION /* Nor here */ "2"
Now you can include both libheader.h and libversion.h and compare the numeric versions with impunity (as long as you get the expressions correct):
#include "libheader.h"
#include "libversion.h"
#if ((NUM_MAJOR_VERSION == 2 && NUM_MINOR_VERSION >= 2) || NUM_MAJOR_VERSION > 2)
…perform the new action…
#else
…perform the old action…
#endif
Strictly, the sed script will also convert MIJOR_VERSION and MANOR_VERSION; however, they're unlikely to appear in the library header, and you can ignore the generated numeric versions with ease. There are ways to deal with that if you really think it is an actual rather than hypothetical problem.
More seriously, if the library has complicated controls on the version information, it could be that a single header can masquerade as different versions of the library — there could be multiple lines defining the major and minor versions. If that's the case, you have to work a lot harder.
#define MAJOR_VERSION 2 will work anywhere, as an int, as you have, 2, there is no need for string/ conversions. You can directly do:
if (MAJOR_VERSION == 2) { /* version 2 */ }
else { /* not version 2 */ }
I'm dynamically loading some Linux libraries in C.
I can get the start addresses of the libraries using the
dlinfo
(see 1).
I can't find any information to get the size of a library, however.
The only thing that I've found is that one must read the
/proc/[pid]/maps
file and parse it for the relevant information (see 2).
Is there a more elegant method?
(This answer is LINUX/GLIBC specific)
According to http://s.eresi-project.org/inc/articles/elf-rtld.txt
there are link_map *map; map->l_map_start & map->l_map_end
/*
** Start and finish of memory map for this object.
** l_map_start need not be the same as l_addr.
*/
ElfW(Addr) l_map_start, l_map_end;
It is a bit not exact, as said here http://www.cygwin.com/ml/libc-hacker/2007-06/msg00014.html
= some libraries are not continous in memory; the letter linked has some examples... e.g. this is the very internal (to rtld) function to detect is the given address inside lib's address space or not, based on link_map and direct working with ELF segments:
/* Return non-zero if ADDR lies within one of L's segments. */
int
internal_function
_dl_addr_inside_object (struct link_map *l, const ElfW(Addr) addr)
{
int n = l->l_phnum;
const ElfW(Addr) reladdr = addr - l->l_addr;
while (--n >= 0)
if (l->l_phdr[n].p_type == PT_LOAD
&& reladdr - l->l_phdr[n].p_vaddr >= 0
&& reladdr - l->l_phdr[n].p_vaddr < l->l_phdr[n].p_memsz)
return 1;
return 0;
}
And this function is the Other alternative, which is to find program headers/ or section headers of ELF loaded (there are some links to such information in link_map)
And the easiest is to use some stat syscall with map->l_name - to read file size from the disk (inexact in detecting huge bss section).
Parsing /proc/self/maps (or perhaps popen-ing a pmap command) seems still the easiest thing to me. And there is also the dladdr function (provided you have some adress to start with).
I'm writing some code which stores some data structures in a special named binary section. These are all instances of the same struct which are scattered across many C files and are not within scope of each other. By placing them all in the named section I can iterate over all of them.
In GCC, I use _attribute_((section(...)) plus some specially named extern pointers which are magically filled in by the linker. Here's a trivial example:
#include <stdio.h>
extern int __start___mysection[];
extern int __stop___mysection[];
static int x __attribute__((section("__mysection"))) = 4;
static int y __attribute__((section("__mysection"))) = 10;
static int z __attribute__((section("__mysection"))) = 22;
#define SECTION_SIZE(sect) \
((size_t)((__stop_##sect - __start_##sect)))
int main(void)
{
size_t sz = SECTION_SIZE(__mysection);
int i;
printf("Section size is %u\n", sz);
for (i=0; i < sz; i++) {
printf("%d\n", __start___mysection[i]);
}
return 0;
}
I'm trying to figure out how to do this in MSVC but I'm drawing a blank. I see from the compiler documentation that I can declare the section using __pragma(section(...)) and declare data to be in that section with __declspec(allocate(...)) but I can't see how I can get a pointer to the start and end of the section at runtime.
I've seen some examples on the web related to doing _attribute_((constructor)) in MSVC, but it seems like hacking specific to CRT and not a general way to get a pointer to the beginning/end of a section. Anyone have any ideas?
There is also a way to do this with out using an assembly file.
#pragma section(".init$a")
#pragma section(".init$u")
#pragma section(".init$z")
__declspec(allocate(".init$a")) int InitSectionStart = 0;
__declspec(allocate(".init$z")) int InitSectionEnd = 0;
__declspec(allocate(".init$u")) int token1 = 0xdeadbeef;
__declspec(allocate(".init$u")) int token2 = 0xdeadc0de;
The first 3 line defines the segments. These define the sections and take the place of the assembly file. Unlike the data_seg pragma, the section pragma only create the section.
The __declspec(allocate()) lines tell the compiler to put the item in that segment.
From the microsoft page:
The order here is important. Section names must be 8 characters or less. The sections with the same name before the $ are merged into one section. The order that they are merged is determined by sorting the characters after the $.
Another important point to remember are sections are 0 padded to 256 bytes. The START and END pointers will NOT be directly before and after as you would expect.
If you setup your table to be pointers to functions or other none NULL values, it should be easy to skip NULL entries before and after the table, due to the section padding
See this msdn page for more details
First of all, you'll need to create an ASM-file containing all the sections you are interested (for ex., section.asm):
.686
.model flat
PUBLIC C __InitSectionStart
PUBLIC C __InitSectionEnd
INIT$A SEGMENT DWORD PUBLIC FLAT alias(".init$a")
__InitSectionStart EQU $
INIT$A ENDS
INIT$Z SEGMENT DWORD PUBLIC FLAT alias(".init$z")
__InitSectionEnd EQU $
INIT$Z ENDS
END
Next, in your code you can use the following:
#pragma data_seg(".init$u")
int token1 = 0xdeadbeef;
int token2 = 0xdeadc0de;
#pragma data_seg()
This gives such a MAP-file:
Start Length Name Class
0003:00000000 00000000H .init$a DATA
0003:00000000 00000008H .init$u DATA
0003:00000008 00000000H .init$z DATA
Address Publics by Value Rva+Base Lib:Object
0003:00000000 ?token1##3HA 10005000 dllmain.obj
0003:00000000 ___InitSectionStart 10005000 section.obj
0003:00000004 ?token2##3HA 10005004 dllmain.obj
0003:00000008 ___InitSectionEnd 10005008 section.obj
So, as you can see it, the section with the name .init$u is placed between .init$a and .init$z and this gives you ability to get the pointer to the begin of the data via __InitSectionStart symbol and to the end of data via __InitSectionEnd symbol.
I was experimenting here a bit and tried to implement the version without an assembly file, however was struggling with the random number of padding bytes between the sections, which makes it almost impossible to find the start of the .init$u section part if content isn't just pointers or other simple items that could be checked for NULL or some other known pattern.
Whether padding is inserted seems to correlate with the use of debug option Zi. When given, padding is inserted, without, all sections appear exactly in the way one would like to have them.
ML64 allows to cut a lot of the assembly noise :
public foo_start
public foo_stop
.code foo$a
foo_start:
.code foo$z
foo_stop:
end