Finding the address range of the data segment

Finding the address range of the data segment - c

As a programming exercise, I am writing a mark-and-sweep garbage collector in C. I wish to scan the data segment (globals, etc.) for pointers to allocated memory, but I don't know how to get the range of the addresses of this segment. How could I do this?

If you're working on Windows, then there are Windows API that would help you.
//store the base address the loaded Module
dllImageBase = (char*)hModule; //suppose hModule is the handle to the loaded Module (.exe or .dll)
//get the address of NT Header
IMAGE_NT_HEADERS *pNtHdr = ImageNtHeader(hModule);
//after Nt headers comes the table of section, so get the addess of section table
IMAGE_SECTION_HEADER *pSectionHdr = (IMAGE_SECTION_HEADER *) (pNtHdr + 1);
ImageSectionInfo *pSectionInfo = NULL;
//iterate through the list of all sections, and check the section name in the if conditon. etc
for ( int i = 0 ; i < pNtHdr->FileHeader.NumberOfSections ; i++ )
{
char *name = (char*) pSectionHdr->Name;
if ( memcmp(name, ".data", 5) == 0 )
{
pSectionInfo = new ImageSectionInfo(".data");
pSectionInfo->SectionAddress = dllImageBase + pSectionHdr->VirtualAddress;
**//range of the data segment - something you're looking for**
pSectionInfo->SectionSize = pSectionHdr->Misc.VirtualSize;
break;
}
pSectionHdr++;
}
Define ImageSectionInfo as,
struct ImageSectionInfo
{
char SectionName[IMAGE_SIZEOF_SHORT_NAME];//the macro is defined WinNT.h
char *SectionAddress;
int SectionSize;
ImageSectionInfo(const char* name)
{
strcpy(SectioName, name);
}
};
Here's a complete, minimal WIN32 console program you can run in Visual Studio that demonstrates the use of the Windows API:
#include <stdio.h>
#include <Windows.h>
#include <DbgHelp.h>
#pragma comment( lib, "dbghelp.lib" )
void print_PE_section_info(HANDLE hModule) // hModule is the handle to a loaded Module (.exe or .dll)
{
// get the location of the module's IMAGE_NT_HEADERS structure
IMAGE_NT_HEADERS *pNtHdr = ImageNtHeader(hModule);
// section table immediately follows the IMAGE_NT_HEADERS
IMAGE_SECTION_HEADER *pSectionHdr = (IMAGE_SECTION_HEADER *)(pNtHdr + 1);
const char* imageBase = (const char*)hModule;
char scnName[sizeof(pSectionHdr->Name) + 1];
scnName[sizeof(scnName) - 1] = '\0'; // enforce nul-termination for scn names that are the whole length of pSectionHdr->Name[]
for (int scn = 0; scn < pNtHdr->FileHeader.NumberOfSections; ++scn)
{
// Note: pSectionHdr->Name[] is 8 bytes long. If the scn name is 8 bytes long, ->Name[] will
// not be nul-terminated. For this reason, copy it to a local buffer that's nul-terminated
// to be sure we only print the real scn name, and no extra garbage beyond it.
strncpy(scnName, (const char*)pSectionHdr->Name, sizeof(pSectionHdr->Name));
printf(" Section %3d: %p...%p %-10s (%u bytes)\n",
scn,
imageBase + pSectionHdr->VirtualAddress,
imageBase + pSectionHdr->VirtualAddress + pSectionHdr->Misc.VirtualSize - 1,
scnName,
pSectionHdr->Misc.VirtualSize);
++pSectionHdr;
}
}
// For demo purpopses, create an extra constant data section whose name is exactly 8 bytes long (the max)
#pragma const_seg(".t_const") // begin allocating const data in a new section whose name is 8 bytes long (the max)
const char const_string1[] = "This string is allocated in a special const data segment named \".t_const\".";
#pragma const_seg() // resume allocating const data in the normal .rdata section
int main(int argc, const char* argv[])
{
print_PE_section_info(GetModuleHandle(NULL)); // print section info for "this process's .exe file" (NULL)
}
This page may be helpful if you're interested in additional uses of the DbgHelp library.
You can read the PE image format here, to know it in details. Once you understand the PE format, you'll be able to work with the above code, and can even modify it to meet your need.
PE Format
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
An In-Depth Look into the Win32 Portable Executable File Format, Part 1
An In-Depth Look into the Win32 Portable Executable File Format, Part 2
Windows API and Structures
IMAGE_SECTION_HEADER Structure
ImageNtHeader Function
IMAGE_NT_HEADERS Structure
I think this would help you to great extent, and the rest you can research yourself :-)
By the way, you can also see this thread, as all of these are somehow related to this:
Scenario: Global variables in DLL which is used by Multi-threaded Application

The bounds for text (program code) and data for linux (and other unixes):
#include <stdio.h>
#include <stdlib.h>
/* these are in no header file, and on some
systems they have a _ prepended
These symbols have to be typed to keep the compiler happy
Also check out brk() and sbrk() for information
about heap */
extern char etext, edata, end;
int
main(int argc, char **argv)
{
printf("First address beyond:\n");
printf(" program text segment(etext) %10p\n", &etext);
printf(" initialized data segment(edata) %10p\n", &edata);
printf(" uninitialized data segment (end) %10p\n", &end);
return EXIT_SUCCESS;
}
Where those symbols come from: Where are the symbols etext ,edata and end defined?

Since you'll probably have to make your garbage collector the environment in which the program runs, you can get it from the elf file directly.

Load the file that the executable came from and parse the PE headers, for Win32. I've no idea about on other OSes. Remember that if your program consists of multiple files (e.g. DLLs) you may have multiple data segments.

For iOS you can use this solution. It shows how to find the text segment range but you can easily change it to find any segment you like.

Related

Function address in a Position Independent Executable

I created a small unit test library in C.
Its main feature is the fact that you don't need to register your test functions, they are identified as test functions because they have a predefined prefix (test_).
For example, if you want to create a test function, you can write something like this:
int test_abc(void *t)
{
...
}
Yes, just like in Go.
To find the test functions, the runner:
takes the name of the executable from argv[0];
parses the ELF sections to find the symbol table;
from the symbol table, takes all the functions named test_*;
treats the addresses from the symbol table as function pointers;
invoke the test functions.
For PIE binaries, there is one additional step. To find the load address for the test functions, I assume there is a common offset that applies to all functions. To figure out the offset, I subtract the address of main (runtime, function pointer) from the address of main read from the symbol table.
All the things described above are working fine: https://github.com/rodrigo-dc/testprefix
However, as far as I understood, function pointer arithmetic is not allowed by the C99 standard.
Given that I have the address from the symbol table - Is there a reliable way to get the runtime address of functions (in case of PIE binaries)?
I was hoping for some linker variable, some base address, or anything like that.

Is there a reliable way to get the runtime address of functions (in case of PIE binaries)?
Yes: see this answer, and also the comment about using dladdr().
P.S. Note that taking address of main in C++ is not allowed.

Because you have an ELF executable, this probably precludes "funny" architectures (e.g. Intel 8051, PIC, etc.) that might have segmented or non-linear, non-contiguous address spaces.
So, you [probably] can use the method you've described with main to get the actual address. You just need to convert to/from either char * or uintptr_t types so you are using byte offsets/differences.
But, you can also create a unified table of pointers to the various functions using by creating descriptor structs that are placed in a special linker section of your choosing using (e.g.) __attribute__((section("mysection"))
Here is some code that shows what I mean:
#include <stdio.h>
typedef struct {
int (*test_func)(void *); // pointer to test function
const char *test_name; // name of the test
int test_retval; // test return value
// more data ...
int test_xtra;
} testctl_t;
// define a struct instance for a given test
#define ATTACH_TEST(_func) \
testctl_t _func##_ctl __attribute__((section("testctl"))) = { \
.test_func = _func, \
.test_name = #_func \
}
// advance to next struct (must be 16 byte aligned)
#define TESTNEXT(_test) \
(testctl_t *) (((char *) _test) + asiz)
int
test_abc(void *t)
{
printf("test_abc: hello\n");
return 1;
}
ATTACH_TEST(test_abc);
int
test_def(void *t)
{
printf("test_def: hello\n");
return 2;
}
ATTACH_TEST(test_def);
int
main(void)
{
// these are special symbols defined by the linker for our special linker
// section that denote the start/end of the section (similar to
// _etext/_edata)
extern testctl_t __start_testctl;
extern testctl_t __stop_testctl;
size_t rsiz = sizeof(testctl_t);
size_t asiz;
testctl_t *test;
// align the size to a 16 byte boundary
asiz = rsiz;
asiz += 15;
asiz /= 16;
asiz *= 16;
// show the struct sizes
printf("main: sizeof(testctl_t)=%zx/%zx\n",rsiz,asiz);
// section start and stop symbol addresses
printf("main: start=%p stop=%p\n",&__start_testctl,&__stop_testctl);
// cross check of expected pointer values
printf("main: test_abc=%p test_abc_ctl=%p\n",test_abc,&test_abc_ctl);
printf("main: test_def=%p test_def_ctl=%p\n",test_def,&test_def_ctl);
for (test = &__start_testctl; test < &__stop_testctl;
test = TESTNEXT(test)) {
printf("\n");
// show the address of our test descriptor struct and the pointer to
// the function
printf("main: test=%p test_func=%p\n",test,test->test_func);
printf("main: calling %s ...\n",test->test_name);
test->test_retval = test->test_func(test);
printf("main: return is %d\n",test->test_retval);
}
return 0;
}
Here is the program output:
main: sizeof(testctl_t)=18/20
main: start=0x404040 stop=0x404078
main: test_abc=0x401146 test_abc_ctl=0x404040
main: test_def=0x401163 test_def_ctl=0x404060
main: test=0x404040 test_func=0x401146
main: calling test_abc ...
test_abc: hello
main: return is 1
main: test=0x404060 test_func=0x401163
main: calling test_def ...
test_def: hello
main: return is 2

MSP430 SD card application running on top of FATFS appears too restrictive . Is my understanding correct?

I am working my way through an SD card application code example provided by TI for their MSP530 LaunchPad microcontroller development kit. It appears that the example restricts the number of directories and number of files to 10 each (a total of 100 files) which seems overly restrictive for a 32GB SD card. The current code compiles to use less than half of the program space and less than half of available RAM. I am wondering if I misunderstand the code, or if the code is limited by some other reason, such as the available stack size in memory. Below is the code and my comments.
There are several layers: SDCardLogMode, sdcard (SDCardLib), and ff (HAL layer). I've reduced the code below to illustrate the constructions but not runnable - I am more interested if I understand it correctly and if my solution to increase the number of allowed files and directories is flawed.
SDCardLogMode.c there are two places of interest here. The first is the declaration of char dirs[10][MAX_DIR_LEN] and files[10][MAX_FILE_LEN]. the MAX LENs are 8 and 12 respectively and are the maximum allowed length of a name.
/*******************************************************************************
*
* SDCardLogMode.c
* ******************************************************************************/
#include "stdlib.h"
#include "string.h"
#include "SDCardLogMode.h"
#include "driverlib.h"
#include "sdcard.h"
#include "HAL_SDCard.h"
#pragma PERSISTENT(numLogFiles)
uint8_t numLogFiles = 0;
SDCardLib sdCardLib;
char dirs[10][MAX_DIR_LEN];
char files[10][MAX_FILE_LEN]; //10 file names. MAX_FILE_LEN =10
uint8_t dirNum = 0;
uint8_t fileNum = 0;
#define MAX_BUF_SIZE 32
char buffer[MAX_BUF_SIZE];
// FatFs Static Variables
static FIL fil; /* File object */
static char filename[31];
static FRESULT rc;
//....
Later in the same SDCardLogMode.c file is the following function (also reduced for readability). Here the interesting thing is that the code calls SDCardLib_getDirectory(&sdCardLib, "data_log", dirs, &dirNum, files, &fileNum) which consume the "data_log" path and produces dir, and updates &dirNum, files, and &fileNum. I do not believe &sdCardLib (which holds a handle to the FATFS and an interface pointer) is used in this function. At least not that I can tell.
What is puzzling is what's the point of calling SDCardLib_getDirectory() and then not using anything it produces? I did not find any downstream use of the dirs and files char arrays. Nor did I find any use of dirNum and fileNum either.
In the code snippets I show the code for SDCardLib_getDirectory(). I could not find where SDCardLib parameter is used. And as mentioned earlier, I found no use of files and dirs arrays. I can see where the file and directory count could be used to generate new names, but there are already static variables to hold the file count. Can anyone see a reason why the SDCard_getDirectory() was called?
/*
* Store TimeStamp from PC when logging starts to SDCard
*/
void storeTimeStampSDCard()
{
int i = 0;
uint16_t bw = 0;
unsigned long long epoch;
// FRESULT rc;
// Increment log file number
numLogFiles++;
,
//Detect SD card
SDCardLib_Status st = SDCardLib_detectCard(&sdCardLib);
if (st == SDCARDLIB_STATUS_NOT_PRESENT) {
SDCardLib_unInit(&sdCardLib);
mode = '0';
noSDCard = 1; //jn added
return;
}
// Read directory and file
rc = SDCardLib_getDirectory(&sdCardLib, "data_log", dirs, &dirNum, files, &fileNum);
//Create the directory under the root directory
rc = SDCardLib_createDirectory(&sdCardLib, "data_log");
if (rc != FR_OK && rc != FR_EXIST) {
SDCardLib_unInit(&sdCardLib);
mode = '0';
return;
}
//........
}
Now jumping to sdcard.c (SDCardLib layer) to look at SDCardLib_getDirectory() is interesting. It takes the array pointer assigns it to a one dimensional array (e.g. char (*fileList)[MAX_FILE_LEN] and indexes it each time it writes a filename). This code seem fragile since the SDCardLib_createDirectory() simply returns f_mkdir(directoryName), it does not check how many files already exist. Perhaps TI assumes this checking should be done at the application layer above SDCardLogMode....
void SDCardLib_unInit(SDCardLib * lib)
{
/* Unregister work area prior to discard it */
f_mount(0, NULL);
}
FRESULT SDCardLib_getDirectory(SDCardLib * lib,
char * directoryName,
char (*dirList)[MAX_DIR_LEN], uint8_t *dirNum,
char (*fileList)[MAX_FILE_LEN], uint8_t *fileNum)
{
FRESULT rc; /* Result code */
DIRS dir; /* Directory object */
FILINFO fno; /* File information object */
uint8_t dirCnt = 0; /* track current directory count */
uint8_t fileCnt = 0; /* track current directory count */
rc = f_opendir(&dir, directoryName);
for (;;)
{
rc = f_readdir(&dir, &fno); // Read a directory item
if (rc || !fno.fname[0]) break; // Error or end of dir
if (fno.fattrib & AM_DIR) //this is a directory
{
strcat(*dirList, fno.fname); //add this to our list of names
dirCnt++;
dirList++;
}
else //this is a file
{
strcat(*fileList, fno.fname); //add this to our list of names
fileCnt++;
fileList++;
}
}
*dirNum = dirCnt;
*fileNum = fileCnt;
return rc;
}
Below is SDCardLib_createDirectory(SDCardLib *lib, char *directoryName). It just creates a directory, it does not check on the existing number of files.
FRESULT SDCardLib_createDirectory(SDCardLib * lib, char * directoryName)
{
return f_mkdir(directoryName);
}
So coming back to my questions:
Did I understand this code correctly, does it really does limit the number of directories and files to 10 each?
If so, why would the number of files and directories be so limited? The particular MSP430 that this example code came with has 256KB of program space and 8KB of RAM. The compiled code consumes less than half of the available resources (68KB of program space and about 2.5KB of RAM). Is it because any larger would overflow the stack segment?
I want to increase the number of files that can be stored. If I look at the underlying FATFS code, it does not to impose a limit on the number of files or directories (at least not until the sd card is full). If I never intend to display or search the contents of a directory on the MSP430 my thought is to remove SDCard_getDirectory() and the two char arrays (files and dirs). Would there a reason why this would be a bad idea?

There are other microcontrollers with less memory.
The SDCardLib_getDirectory() function treats its dirList and fileList parameters as simple strings, i.e., it calls strcat() on the same pointers. This means that it can read as many names as fit into 10*8 or 10*12 bytes.
And calling strcat() without adding a delimiter means that it is impossible to get the individual names out of the string.
This codes demonstrates that it is possible to use the FatFs library, and in which order its functions need to be called, but it is not necessarily a good example of how to do that. I recommend that you write your own code.

Simple C Kernel char Pointers Aren't Working

I am trying to make a simple kernel using C. Everything loads and works fine, and I can access the video memory and display characters, but when i try to implement a simple puts function for some reason it doesn't work. I've tried my own code and other's. Also, when I try to use a variable which is declared outside a function it doesn't seem to work. This is my own code:
#define PUTCH(C, X) pos = putc(C, X, pos)
#define PUTSTR(C, X) pos = puts(C, X, pos)
int putc(char c, char color, int spos) {
volatile char *vidmem = (volatile char*)(0xB8000);
if (c == '\n') {
spos += (160-(spos % 160));
} else {
vidmem[spos] = c;
vidmem[spos+1] = color;
spos += 2;
}
return spos;
}
int puts(char* str, char color, int spos) {
while (*str != '\0') {
spos = putc(*str, color, spos);
str++;
}
return spos;
}
int kmain(void) {
int pos = 0;
PUTSTR("Hello, world!", 6);
return 0;
}
The spos (starting position) stuff is because I can't make a global position variable. putc works fine, but puts doesn't. I also tried this:
unsigned int k_printf(char *message, unsigned int line) // the message and then the line #
{
char *vidmem = (char *) 0xb8000;
unsigned int i=0;
i=(line*80*2);
while(*message!=0)
{
if(*message=='\n') // check for a new line
{
line++;
i=(line*80*2);
*message++;
} else {
vidmem[i]=*message;
*message++;
i++;
vidmem[i]=7;
i++;
};
};
return(1);
};
int kmain(void) {
k_printf("Hello, world!", 0);
return 0;
}
Why doesn't this work? I tried using my puts implementation with my native GCC (without the color and spos data and using printf("%c")) and it worked fine.

Since you're having an issue with global variables in general, the problem most likely has to-do with where the linker is placing your "Hello World" string literal in memory. This is due to the fact that string literals are typically stored in a read-only portion of global memory by the linker ... You have not detailed exactly how you are compiling and linking your kernel, so I would attempt something like the following and see if that works:
int kmain(void)
{
char array[] = "Hello World\n";
int pos = 0;
puts(array, 0, pos);
return 0;
}
This will allocate the character array on the stack rather than global memory, and avoid any issues with where the linker decides to place global variables.
In general, when creating a simple kernel, you want to compile and link it as a flat binary with no dependencies on external OS libraries. If you're working with a multiboot compliant boot-loader like GRUB, you may want to look at the bare-bones sample code from the multiboot specification pages.

Since this got references outside of SO, I'll add a universal answer
There are several kernel examples around the internet, and many are in various states of degradation - the Multiboot sample code for instance lacks compilation instructions. If you're looking for a working start, a known good example can be found at http://wiki.osdev.org/Bare_Bones
In the end there are three things that should be properly dealt with:
The bootloader will need to properly load the kernel, and as such they must agree on a certain format. GRUB defines the fairly common standard that is Multiboot, but you can roll your own. It boils down that you need to choose a file format and locations where all the parts of your kernel and related metadata end up in memory before the kernel code will ever get executed. One would typically use the ELF format with multiboot which contains that information in its headers
The compiler must be able to create binary code that is relevant to the platform. A typical PC boots in 16-bit mode after which the BIOS or bootloader might often decide to change it. Typically, if you use GRUB legacy, the Multiboot standard puts you in 32-bit mode by its contract. If you used the default compiler settings on a 64-bit linux, you end up with code for the wrong architecture (which happens to be sufficiently similar that you might get something that looks like the result you want). Compilers also like to rename sections or include platform-specific mechanisms and security features such as stack probing or canaries. Especially compilers on windows tend to inject host-specific code that of course breaks when run without the presence of Windows. The example provided deliberately uses a separate compiler to prevent all sorts of problems in this category.these
The linker must be able to combine the code in ways needed to create output that adheres to the bootloader's contract. A linker has a default way of generating a binary, and typically it's not at all what you want. In pretty much all cases, choosing gnu ld for this task means that you're required to write a linker script that puts all the sections in the places where you want. Omitted sections will result in data going missing, sections at the wrong location might make an image unbootable. Assuming you have gnu ld, you can also use the bundled nm and objdump tools besides your hex editor of choice to tell you where things have appeared in your output binary, and with it, check if you've been following the contract that has been set for you.
Problems of this fundamental type are eventually tracked back to not following one or more of the steps above. Use the reference at the top of this answer and go find the differences.

How to view Linux memory map info in C?

I'm dynamically loading some Linux libraries in C.
I can get the start addresses of the libraries using the
dlinfo
(see 1).
I can't find any information to get the size of a library, however.
The only thing that I've found is that one must read the
/proc/[pid]/maps
file and parse it for the relevant information (see 2).
Is there a more elegant method?

(This answer is LINUX/GLIBC specific)
According to http://s.eresi-project.org/inc/articles/elf-rtld.txt
there are link_map *map; map->l_map_start & map->l_map_end
/*
** Start and finish of memory map for this object.
** l_map_start need not be the same as l_addr.
*/
ElfW(Addr) l_map_start, l_map_end;
It is a bit not exact, as said here http://www.cygwin.com/ml/libc-hacker/2007-06/msg00014.html
= some libraries are not continous in memory; the letter linked has some examples... e.g. this is the very internal (to rtld) function to detect is the given address inside lib's address space or not, based on link_map and direct working with ELF segments:
/* Return non-zero if ADDR lies within one of L's segments. */
int
internal_function
_dl_addr_inside_object (struct link_map *l, const ElfW(Addr) addr)
{
int n = l->l_phnum;
const ElfW(Addr) reladdr = addr - l->l_addr;
while (--n >= 0)
if (l->l_phdr[n].p_type == PT_LOAD
&& reladdr - l->l_phdr[n].p_vaddr >= 0
&& reladdr - l->l_phdr[n].p_vaddr < l->l_phdr[n].p_memsz)
return 1;
return 0;
}
And this function is the Other alternative, which is to find program headers/ or section headers of ELF loaded (there are some links to such information in link_map)
And the easiest is to use some stat syscall with map->l_name - to read file size from the disk (inexact in detecting huge bss section).

Parsing /proc/self/maps (or perhaps popen-ing a pmap command) seems still the easiest thing to me. And there is also the dladdr function (provided you have some adress to start with).

How to get a pointer to a binary section in MSVC?

I'm writing some code which stores some data structures in a special named binary section. These are all instances of the same struct which are scattered across many C files and are not within scope of each other. By placing them all in the named section I can iterate over all of them.
In GCC, I use _attribute_((section(...)) plus some specially named extern pointers which are magically filled in by the linker. Here's a trivial example:
#include <stdio.h>
extern int __start___mysection[];
extern int __stop___mysection[];
static int x __attribute__((section("__mysection"))) = 4;
static int y __attribute__((section("__mysection"))) = 10;
static int z __attribute__((section("__mysection"))) = 22;
#define SECTION_SIZE(sect) \
((size_t)((__stop_##sect - __start_##sect)))
int main(void)
{
size_t sz = SECTION_SIZE(__mysection);
int i;
printf("Section size is %u\n", sz);
for (i=0; i < sz; i++) {
printf("%d\n", __start___mysection[i]);
}
return 0;
}
I'm trying to figure out how to do this in MSVC but I'm drawing a blank. I see from the compiler documentation that I can declare the section using __pragma(section(...)) and declare data to be in that section with __declspec(allocate(...)) but I can't see how I can get a pointer to the start and end of the section at runtime.
I've seen some examples on the web related to doing _attribute_((constructor)) in MSVC, but it seems like hacking specific to CRT and not a general way to get a pointer to the beginning/end of a section. Anyone have any ideas?

There is also a way to do this with out using an assembly file.
#pragma section(".init$a")
#pragma section(".init$u")
#pragma section(".init$z")
__declspec(allocate(".init$a")) int InitSectionStart = 0;
__declspec(allocate(".init$z")) int InitSectionEnd = 0;
__declspec(allocate(".init$u")) int token1 = 0xdeadbeef;
__declspec(allocate(".init$u")) int token2 = 0xdeadc0de;
The first 3 line defines the segments. These define the sections and take the place of the assembly file. Unlike the data_seg pragma, the section pragma only create the section.
The __declspec(allocate()) lines tell the compiler to put the item in that segment.
From the microsoft page:
The order here is important. Section names must be 8 characters or less. The sections with the same name before the $ are merged into one section. The order that they are merged is determined by sorting the characters after the $.
Another important point to remember are sections are 0 padded to 256 bytes. The START and END pointers will NOT be directly before and after as you would expect.
If you setup your table to be pointers to functions or other none NULL values, it should be easy to skip NULL entries before and after the table, due to the section padding
See this msdn page for more details

First of all, you'll need to create an ASM-file containing all the sections you are interested (for ex., section.asm):
.686
.model flat
PUBLIC C __InitSectionStart
PUBLIC C __InitSectionEnd
INIT$A SEGMENT DWORD PUBLIC FLAT alias(".init$a")
__InitSectionStart EQU $
INIT$A ENDS
INIT$Z SEGMENT DWORD PUBLIC FLAT alias(".init$z")
__InitSectionEnd EQU $
INIT$Z ENDS
END
Next, in your code you can use the following:
#pragma data_seg(".init$u")
int token1 = 0xdeadbeef;
int token2 = 0xdeadc0de;
#pragma data_seg()
This gives such a MAP-file:
Start Length Name Class
0003:00000000 00000000H .init$a DATA
0003:00000000 00000008H .init$u DATA
0003:00000008 00000000H .init$z DATA
Address Publics by Value Rva+Base Lib:Object
0003:00000000 ?token1##3HA 10005000 dllmain.obj
0003:00000000 ___InitSectionStart 10005000 section.obj
0003:00000004 ?token2##3HA 10005004 dllmain.obj
0003:00000008 ___InitSectionEnd 10005008 section.obj
So, as you can see it, the section with the name .init$u is placed between .init$a and .init$z and this gives you ability to get the pointer to the begin of the data via __InitSectionStart symbol and to the end of data via __InitSectionEnd symbol.

I was experimenting here a bit and tried to implement the version without an assembly file, however was struggling with the random number of padding bytes between the sections, which makes it almost impossible to find the start of the .init$u section part if content isn't just pointers or other simple items that could be checked for NULL or some other known pattern.
Whether padding is inserted seems to correlate with the use of debug option Zi. When given, padding is inserted, without, all sections appear exactly in the way one would like to have them.

ML64 allows to cut a lot of the assembly noise :
public foo_start
public foo_stop
.code foo$a
foo_start:
.code foo$z
foo_stop:
end

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Finding the address range of the data segment - c

As a programming exercise, I am writing a mark-and-sweep garbage collector in C. I wish to scan the data segment (globals, etc.) for pointers to allocated memory, but I don't know how to get the range of the addresses of this segment. How could I do this?

Since you'll probably have to make your garbage collector the environment in which the program runs, you can get it from the elf file directly.

Load the file that the executable came from and parse the PE headers, for Win32. I've no idea about on other OSes. Remember that if your program consists of multiple files (e.g. DLLs) you may have multiple data segments.

For iOS you can use this solution. It shows how to find the text segment range but you can easily change it to find any segment you like.

Related

Function address in a Position Independent Executable

MSP430 SD card application running on top of FATFS appears too restrictive . Is my understanding correct?

Simple C Kernel char Pointers Aren't Working

How to view Linux memory map info in C?

How to get a pointer to a binary section in MSVC?

Categories

Resources